Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
REVIEW Webpage Webpage-cn SpringerLink Google Scholar

Large language models meet text-centric multimodal sentiment analysis: a survey
Yang H, Zhao Y Y, Wu Y, et al
Sci China Inf Sci, 2025, 68(10): 200101
Keywords: text-centric; multimodal sentiment analysis; large language models; survey
Cite as: Yang H, Zhao Y Y, Wu Y, et al. Large language models meet text-centric multimodal sentiment analysis: a survey. Sci China Inf Sci, 2025, 68(10): 200101, doi: 10.1007/s11432-024-4593-8

Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

VideoChat: chat-centric video understanding
Li K C, He Y N, Wang Y, et al
Sci China Inf Sci, 2025, 68(10): 200102
Keywords: video understanding; large language model; multi-modality learning; large multimodal models; spatiotemporal perception
Cite as: Li K C, He Y N, Wang Y, et al. VideoChat: chat-centric video understanding. Sci China Inf Sci, 2025, 68(10): 200102, doi: 10.1007/s11432-024-4321-9

Special Topic: Large Multimodal Models (2025)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

UniAnimate: taming unified video diffusion models for consistent human image animation
Wang X, Zhang S W, Gao C X, et al
Sci China Inf Sci, 2025, 68(10): 200103
Keywords: video generation; human image animation; diffusion model; large multi-modal models; temporal modeling
Cite as: Wang X, Zhang S W, Gao C X, et al. UniAnimate: taming unified video diffusion models for consistent human image animation. Sci China Inf Sci, 2025, 68(10): 200103, doi: 10.1007/s11432-024-4592-3

Special Topic: Large Multimodal Models (2025)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

Keywords: multimodal; text-to-image; heuristic optimization; human-AI collaboration; AI application
Cite as: Shu Y B, Tian T, Zhang P, et al. SoulSearch: applying heuristic optimization to enhance text-to-image gen- eration with personalized human-LMM collaboration. Sci China Inf Sci, 2025, 68(10): 200104, doi: 10.1007/s11432-024-4564-6

Special Topic: Large Multimodal Models (2025)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

From unimodal to multimodal: a framework for generating high-quality multimodal emotional chit-chat dialogue
Yang H, Zheng T, Zhao Y Y, et al
Sci China Inf Sci, 2025, 68(10): 200105
Keywords: multimodal; emotional dialogue; data generation
Cite as: Yang H, Zheng T, Zhao Y Y, et al. From unimodal to multimodal: a framework for generating high-quality multimodal emotional chit-chat dialogue. Sci China Inf Sci, 2025, 68(10): 200105, doi: 10.1007/s11432-024-4591-x

Special Topic: Large Multimodal Models (2025)
RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar

Consistent multimodal pre-training for visual tokenization
Pan T, Tang L L, Wang X L, et al
Sci China Inf Sci, 2025, 68(10): 200106
Keywords: foundation model; multimodal; representation learning; visual tokenization
Cite as: Pan T, Tang L L, Wang X L, et al. Consistent multimodal pre-training for visual tokenization. Sci China Inf Sci, 2025, 68(10): 200106, doi: 10.1007/s11432-024-4603-x

Special Topic: Large Multimodal Models (2025)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

MULTI: multimodal understanding leaderboard with text and images
Zhu Z C, Xu Y, Chen L, et al
Sci China Inf Sci, 2025, 68(10): 200107
Keywords: multimodal; large language model; logic reasoning; image comprehension; benchmark
Cite as: Zhu Z C, Xu Y, Chen L, et al. MULTI: multimodal understanding leaderboard with text and images. Sci China Inf Sci, 2025, 68(10): 200107, doi: 10.1007/s11432-024-4602-x

Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Progressive language-aware encoding and decoding for referring expression comprehension
Zhao, Yichen; Chen, Yaxiong; Rong, Yi; Xiong, Shengwu
Sci China Inf Sci, 2025, 68(10): 200111
Keywords: referring expression comprehension; vision-and-language; visual grounding; multimodal fusion and reasoning; multimodal transformer
Cite as: Zhao Y C, Chen Y X, Rong Y, et al. Progressive language-aware encoding and decoding for referring expression comprehension. Sci China Inf Sci, 2025, 68(10): 200111, doi: 10.1007/s11432-024-4312-9

Special Topic: Large Multimodal Models (2025)
LETTER Supplementary Video Webpage Webpage-cn SpringerLink Google Scholar

LOVECon: text-driven training-free long video editing with ControlNet
Liao Z Y, Xie Q S, Deng Z J
Sci China Inf Sci, 2025, 68(10): 200112
Keywords: deep generative model; conditional diffusion models; text-driven editing; video editing; video interpolation
Cite as: Liao Z Y, Xie Q S, Deng Z J. LOVECon: text-driven training-free long video editing with ControlNet. Sci China Inf Sci, 2025, 68(10): 200112, doi: 10.1007/s11432-024-4596-1