Publications on computer graphics and computer vision - SCIENCE CHINA Information Sciences

Highly Cited in 202603 图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 100

Fewer is more: efficient object detection in large aerial images
Xie, Xingxing; Cheng, Gong; Li, Qingyang; Miao, Shicheng; Li, Ke; Han, Junwei
Sci China Inf Sci, 2024, 67(1): 112106

Keywords: efficient object detection; large aerial images; objectness activation network

Cite as: Xie X X, Cheng G, Li Q Y, et al. Fewer is more: efficient object detection in large aerial images. Sci China Inf Sci, 2024, 67: 112106, doi: 10.1007/s11432-022-3718-5

Special Topic: Large Multimodal Models
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 77

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites
Chen, Zhe; Wang, Weiyun; Tian, Hao; Ye, Shenglong; Gao, Zhangwei; Cui, Erfei; Tong, Wenwen; Hu, Kongzhi; Luo, Jiapeng; Ma, Zheng; Ma, Ji; Wang, Jiaqi; Dong, Xiaoyi; Yan, Hang; Guo, Hewei; He, Conghui; Shi, Botian; Jin, Zhenjiang; Xu, Chao; Wang, Bin; Wei, Xingjian; Li, Wei; Zhang, Wenjian; Zhang, Bo; Cai, Pinlong; Wen, Licheng; Yan, Xiangchao; Dou, Min; Lu, Lewei; Zhu, Xizhou; Lu, Tong; Lin, Dahua; Qiao, Yu; Dai, Jifeng; Wang, Wenhai
Sci China Inf Sci, 2024, 67(12): 220101

Keywords: multimodal model; open-source; vision encoder; dynamic resolution; bilingual dataset; LMM

Cite as: Chen Z, Wang W Y, Tian H, et al. How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites. Sci China Inf Sci, 2024, 67: 220101, doi: 10.1007/s11432-024-4231-5

Special Topic: Large Multimodal Models
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 54

OCRBench: on the hidden mystery of OCR in large multimodal models
Liu, Yuliang; Li, Zhang; Huang, Mingxin; Yang, Biao; Yu, Wenwen; Li, Chunyuan; Yin, Xu-Cheng; Liu, Cheng-Lin; Jin, Lianwen; Bai, Xiang
Sci China Inf Sci, 2024, 67(12): 220102

Keywords: large multimodal model; LMM; OCR; text recognition; scene text-centric VQA; document-oriented VQA; key information extraction; handwritten mathematical expression recognition

Cite as: Liu Y L, Li Z, Huang M X, et al. OCRBench: on the hidden mystery of OCR in large multimodal models. Sci China Inf Sci, 2024, 67: 220102, doi: 10.1007/s11432-024-4235-6

Special Topic: Large Multimodal Models
SCIS Selected Articles on Large Language Models (LLM)
图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 43

Woodpecker: hallucination correction for multimodal large language models
Yin, Shukang; Fu, Chaoyou; Zhao, Sirui; Xu, Tong; Wang, Hao; Sui, Dianbo; Shen, Yunhang; Li, Ke; Sun, Xing; Chen, Enhong
Sci China Inf Sci, 2024, 67(12): 220105

Keywords: multimodal learning; multimodal large language models; hallucination correction; large language models; vision and language; LMM

Cite as: Yin S K, Fu C Y, Zhao S R, et al. Woodpecker: hallucination correction for multimodal large language models. Sci China Inf Sci, 2024, 67: 220105, doi: 10.1007/s11432-024-4251-x

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 24

RGB oralscan video-based orthodontic treatment monitoring
Tian, Yan; Fu, Hanshi; Wang, Hao; Liu, Yuqi; Xu, Zhaocheng; Chen, Hong; Li, Jianyuan; Wang, Ruili
Sci China Inf Sci, 2024, 67(1): 112107

Keywords: digital dentistry; object 6D pose estimation; deep learning; computer vision

Cite as: Tian Y, Fu H S, Wang H, et al. RGB oralscan video-based orthodontic treatment monitoring. Sci China Inf Sci, 2024, 67: 112107, doi: 10.1007/s11432-023-3847-x

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 22

Robust video question answering via contrastive cross-modality representation learning
Yang, Xun; Zeng, Jianming; Guo, Dan; Wang, Shanshan; Dong, Jianfeng; Wang, Meng
Sci China Inf Sci, 2024, 67(10): 202104

Keywords: video question answering; cross-modality fusion; contrastive learning; cross-media reasoning

Cite as: Yang X, Zeng J M, Guo D, et al. Robust video question answering via contrastive cross-modality representation learning. Sci China Inf Sci, 2024, 67: 202104, doi: 10.1007/s11432-023-4084-6

Special Topic: Large Multimodal Models
SCIS Selected Articles on Large Language Models (LLM)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 16

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding
Feng, Hao; Liu, Qi; Liu, Hao; Tang, Jingqun; Zhou, Wengang; Li, Houqiang; Huang, Can
Sci China Inf Sci, 2024, 67(12): 220106

Keywords: document understanding; large multimodal model; LMM; OCR-free; high-resolution; frequency

Cite as: Feng H, Liu Q, Liu H, et al. DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding. Sci China Inf Sci, 2024, 67: 220106, doi: 10.1007/s11432-024-4250-y

Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 11

VideoChat: chat-centric video understanding
Li, Kunchang; He, Yinan; Wang, Yi; Li, Yizhuo; Wang, Wenhai; Luo, Ping; Wang, Yali; Wang, Limin; Qiao, Yu
Sci China Inf Sci, 2025, 68(10): 200102

Keywords: video understanding; large language model; multi-modality learning; large multimodal models; spatiotemporal perception

Cite as: Li K C, He Y N, Wang Y, et al. VideoChat: chat-centric video understanding. Sci China Inf Sci, 2025, 68: 200102, doi: 10.1007/s11432-024-4321-9

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 11

COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection
Zhang, Xiaoqin; Yu, Zhenni; Zhao, Li; Fan, Deng-Ping; Xiao, Guobao
Sci China Inf Sci, 2025, 68(1): 112104

Keywords: segment anything model; camouflaged object detection; boundary; prompt

Cite as: Zhang X Q, Yu Z N, Zhao L, et al. COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection. Sci China Inf Sci, 2025, 68: 112104, doi: 10.1007/s11432-024-4233-9

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 11

SAM3D: zero-shot 3D object detection via the segment anything model
Zhang, Dingyuan; Liang, Dingkang; Yang, Hongcheng; Zou, Zhikang; Ye, Xiaoqing; Liu, Zhe; Bai, Xiang
Sci China Inf Sci, 2024, 67(4): 149101

Keywords: Zero-shot 3D Object Detection; Foundation Model; Segment Anything Model; BEV Perception; Segmentation

Cite as: Zhang D Y, Liang D K, Yang H C, et al. SAM3D: zero-shot 3D object detection via the segment anything model. Sci China Inf Sci, 2024, 67: 149101, doi: 10.1007/s11432-023-3943-6

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 10

Attention-enhanced computational ghost imaging
Chen, Yifan; Tian, Tong; Lu, Xin; Li, Chen; Zhu, Ruolan; Sun, Zhe; Li, Xuelong
Sci China Inf Sci, 2025, 68(6): 162104

Keywords: computational ghost imaging; attention mechanism; deep learning; self-supervised; speckle pattern

Cite as: Chen Y F, Tian T, Lu X, et al. Attention-enhanced computational ghost imaging. Sci China Inf Sci, 2025, 68: 162104, doi: 10.1007/s11432-024-4434-5

Special Topic: Large Multimodal Models
图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 10

ChemDFM-X: towards large multimodal model for chemistry
Zhao, Zihan; Chen, Bo; Li, Jingpiao; Chen, Lu; Wen, Liyang; Wang, Pengyu; Zhu, Zichen; Zhang, Danyang; Li, Yansi; Dai, Zhongyang; Chen, Xin; Yu, Kai
Sci China Inf Sci, 2024, 67(12): 220109

Keywords: LMM; AI for Science; Instruction-Tuning; Cross-Modality; Chemistry

Cite as: Zhao Z H, Chen B, Li J P, et al. ChemDFM-X: towards large multimodal model for chemistry. Sci China Inf Sci, 2024, 67: 220109, doi: 10.1007/s11432-024-4243-0

Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
图形图像 REVIEW Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 9

Large language models meet text-centric multimodal sentiment analysis: a survey
Yang, Hao; Zhao, Yanyan; Wu, Yang; Wang, Shilong; Zheng, Tian; Zhang, Hongbo; Ma, Zongyang; Che, Wanxiang; Wang, Shijin; Wei, Si; Qin, Bing
Sci China Inf Sci, 2025, 68(10): 200101

Keywords: text-centric; multimodal sentiment analysis; large language models; survey

Cite as: Yang H, Zhao Y Y, Wu Y, et al. Large language models meet text-centric multimodal sentiment analysis: a survey. Sci China Inf Sci, 2025, 68: 200101, doi: 10.1007/s11432-024-4593-8

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 9

BEV-Locator: an end-to-end visual semantic localization network using multi-view images
Zhang, Zhihuang; Xu, Meng; Zhou, Wenqiang; Peng, Tao; Li, Liang; Poslad, Stefan
Sci China Inf Sci, 2025, 68(2): 122106

Keywords: visual localization; semantic map; bird-eye-view; transformer; pose estimation

Cite as: Zhang Z H, Xu M, Zhou W Q, et al. BEV-Locator: an end-to-end visual semantic localization network using multi-view images. Sci China Inf Sci, 2025, 68: 122106, doi: 10.1007/s11432-023-4114-6

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 8

Relative difficulty distillation for semantic segmentation
Liang, Dong; Sun, Yue; Du, Yun; Chen, Songcan; Huang, Sheng-Jun
Sci China Inf Sci, 2024, 67(9): 192105

Keywords: knowledge distillation; semantic segmentation; relative difficulty; sample weighting; prediction discrepancy

Cite as: Liang D, Sun Y, Du Y, et al. Relative difficulty distillation for semantic segmentation. Sci China Inf Sci, 2024, 67: 192105, doi: 10.1007/s11432-023-4061-2

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 7

Aligning enhanced feature representation for generalized zero-shot learning
Fang, Zhiyu; Zhu, Xiaobin; Yang, Chun; Zhou, Hongyang; Qin, Jingyan; Yin, Xu-Cheng
Sci China Inf Sci, 2025, 68(2): 122102

Keywords: generalized zero-shot learning; gated attention mechanism; contrastive learning; multi-modal alignment

Cite as: Fang Z Y, Zhu X B, Yang C, et al. Aligning enhanced feature representation for generalized zero-shot learning. Sci China Inf Sci, 2025, 68: 122102, doi: 10.1007/s11432-023-4174-4

Special Topic: Large Multimodal Models
图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 7

COMET: "cone of experience" enhanced large multimodal model for mathematical problem generation
Liu, Sannyuya; Feng, Jintian; Yang, Zongkai; Luo, Yawei; Wan, Qian; Shen, Xiaoxuan; Sun, Jianwen
Sci China Inf Sci, 2024, 67(12): 220108

Keywords: mathematical problem generation; mathematical problem solving; large multimodal model; LMM; educational application; smart education

Cite as: Liu S N Y, Feng J T, Yang Z K, et al. COMET: "cone of experience" enhanced large multimodal model for mathematical problem generation. Sci China Inf Sci, 2024, 67: 220108, doi: 10.1007/s11432-024-4242-0

图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 6

ControlVideo: conditional control for one-shot text-driven video editing and beyond
Zhao, Min; Wang, Rongzhen; Bao, Fan; Li, Chongxuan; Zhu, Jun
Sci China Inf Sci, 2025, 68(3): 132107

Keywords: diffusion models; controllable generation; text-driven editing; video editing; long video editing

Cite as: Zhao M, Wang R Z, Bao F, et al. ControlVideo: conditional control for one-shot text-driven video editing and beyond. Sci China Inf Sci, 2025, 68: 132107, doi: 10.1007/s11432-023-4184-4

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 6

Rethinking attribute localization for zero-shot learning
Chen, Shuhuang; Chen, Shiming; Xie, Guo-Sen; Shu, Xiangbo; You, Xinge; Li, Xuelong
Sci China Inf Sci, 2024, 67(7): 172103

Keywords: zero-shot learning; attention mechanism; attribute localization; image classification

Cite as: Chen S H, Chen S M, Xie G-S, et al. Rethinking attribute localization for zero-shot learning. Sci China Inf Sci, 2024, 67: 172103, doi: 10.1007/s11432-023-4051-9

图形图像 MOOP Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

Situation-adaptive neural network for fast pre-computing image enhancement
Li, Xinyue; Duan, Huiyu; Wang, Jia; Liu, Xiaohong; Chen, Yitong; Zhai, Guangtao
Sci China Inf Sci, 2025, 68(2): 124101

Keywords: photonic computing; image enhancement; look up table; pre-computing; deep learning

Cite as: Li X Y, Duan H Y, Wang J, et al. Situation-adaptive neural network for fast pre-computing image enhancement. Sci China Inf Sci, 2025, 68: 124101, doi: 10.1007/s11432-024-4166-y

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

PointSmile: point self-supervised learning via curriculum mutual information
Li, Xin; Wei, Mingqiang; Chen, Songcan
Sci China Inf Sci, 2024, 67(11): 212104

Keywords: PointSmile; self-supervised learning; curriculum mutual information; point cloud; representation learning

Cite as: Li X, Wei M Q, Chen S C. PointSmile: point self-supervised learning via curriculum mutual information. Sci China Inf Sci, 2024, 67: 212104, doi: 10.1007/s11432-023-4085-9

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

Saliency-guided meta-hallucinator for few-shot learning
Zhang, Hongguang; Liu, Chun; Wang, Jiandong; Ma, Linru; Koniusz, Piotr; Torr, Philip H. S.; Yang, Lin
Sci China Inf Sci, 2024, 67(10): 202103

Keywords: few-shot learning; saliency detection; object recognition; anomaly detection; computer vision

Cite as: Zhang H G, Liu C, Wang J D, et al. Saliency-guided meta-hallucinator for few-shot learning. Sci China Inf Sci, 2024, 67: 202103, doi: 10.1007/s11432-023-4113-1

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

Pyramid-resolution person restoration for cross-resolution person re-identification
Peng, Chunlei; Wang, Bo; Liu, Decheng; Wang, Nannan; Gao, Xinbo
Sci China Inf Sci, 2024, 67(6): 169101

Keywords: pyramid-resolution; cross-resolution person ReID; image restoration; feature distance fusion; multi-resolution person ReID

Cite as: Peng C L, Wang B, Liu D C, et al. Pyramid-resolution person restoration for cross-resolution person re-identification. Sci China Inf Sci, 2024, 67: 169101, doi: 10.1007/s11432-023-4023-y

图形图像 PERSPECTIVE Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

Towards cobodied/symbodied AI: concept and eight scientific and technical problems
Lu, Feng; Zhao, Qinping
Sci China Inf Sci, 2026, 69(1): 116101

Keywords: cobodied AI; symbodied AI; artificial intelligence; dual-brain fusion; physical co-embodiment

Cite as: Lu F, Zhao Q P. Towards cobodied/symbodied AI: concept and eight scientific and technical problems. Sci China Inf Sci, 2026, 69: 116101, doi: 10.1007/s11432-025-4589-x

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

Individual/joint deblurring and low-light image enhancement in one go via unsupervised deblurring paradigm
Zhao, Suiyi; Zhang, Zhao; Wei, Yanyan; Fan, Jicong; Zhao, Yang; Yan, Shuicheng; Wang, Meng
Sci China Inf Sci, 2025, 68(12): 222104

Keywords: unsupervised learning; image processing; joint restoration; deblurring; low-light enhancement

Cite as: Zhao S Y, Zhang Z, Wei Y Y, et al. Individual/joint deblurring and low-light image enhancement in one go via unsupervised deblurring paradigm. Sci China Inf Sci, 2025, 68: 222104, doi: 10.1007/s11432-023-4562-8

图形图像 MOOP Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

Virtual-physical digital twin testbed for heterogeneous crowd operations
Xu, Mingliang; Wang, Zheng; Meng, Yang; Luo, Wencan; Wang, Hua; He, Shuo; Li, Chaochao; Guo, Yibo; Li, Yafei; Lv, Pei
Sci China Inf Sci, 2025, 68(5): 154101

Keywords: heterogeneous crowd operations; virtual-physical integration; digital twin testbed; physical sandbox; twin simulation; cloud control; interactive interface

Cite as: Xu M L, Wang Z, Meng Y, et al. Virtual-physical digital twin testbed for heterogeneous crowd operations. Sci China Inf Sci, 2025, 68: 154101, doi: 10.1007/s11432-024-4339-5

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

Multi-receptive field interaction network for shape from polarization
Peng, Yini; Liu, Rui; Zhang, Zhiyuan; Wang, Zhongyuan; Ma, Jiayi; Tian, Xin
Sci China Inf Sci, 2025, 68(1): 119102

Keywords: shape from polarization; 3D reconstruction; deep learning; multi-receptive field interaction; surface normal

Cite as: Peng Y N, Liu R, Zhang Z Y, et al. Multi-receptive field interaction network for shape from polarization. Sci China Inf Sci, 2025, 68: 119102, doi: 10.1007/s11432-024-4212-2

Special Topic: Large Multimodal Models
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity
Liu, Yangzhou; Cao, Yue; Gao, Zhangwei; Wang, Weiyun; Chen, Zhe; Wang, Wenhai; Tian, Hao; Lu, Lewei; Zhu, Xizhou; Lu, Tong; Qiao, Yu; Dai, Jifeng
Sci China Inf Sci, 2024, 67(12): 220103

Keywords: instruction tuning; multi-modal; multi-domain; dataset; vision large language model; LMM

Cite as: Liu Y Z, Cao Y, Gao Z W, et al. MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity. Sci China Inf Sci, 2024, 67: 220103, doi: 10.1007/s11432-024-4187-3

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

SeeMore: a spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation
Ma, Yuqing; Liu, Wei; Gao, Yajun; Yuan, Yang; Bai, Shihao; Qin, Haotong; Liu, Xianglong
Sci China Inf Sci, 2024, 67(8): 182104

Keywords: spatiotemporal predictive learning; knowledge transfer; bidirectional distillation network; level-specific meta-adapter; coarse-to-fine training

Cite as: Ma Y Q, Liu W, Gao Y J, et al. SeeMore: a spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation. Sci China Inf Sci, 2024, 67: 182104, doi: 10.1007/s11432-022-3859-8

图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

RHINO: regularizing the hash-based implicit neural representation
Zhu, Hao; Liu, Fengyi; Zhang, Qi; Ma, Zhan; Cao, Xun
Sci China Inf Sci, 2026, 69(1): 112101

Keywords: implicit neural representation; regularization; static/dynamic neural radiance field; signed distance function

Cite as: Zhu H, Liu F Y, Zhang Q, et al. RHINO: regularizing the hash-based implicit neural representation. Sci China Inf Sci, 2026, 69: 112101, doi: 10.1007/s11432-024-4490-3

Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

Progressive language-aware encoding and decoding for referring expression comprehension
Zhao, Yichen; Chen, Yaxiong; Rong, Yi; Xiong, Shengwu
Sci China Inf Sci, 2025, 68(10): 200111

Keywords: referring expression comprehension; vision-and-language; visual grounding; multimodal fusion and reasoning; multimodal transformer

Cite as: Zhao Y C, Chen Y X, Rong Y, et al. Progressive language-aware encoding and decoding for referring expression comprehension. Sci China Inf Sci, 2025, 68: 200111, doi: 10.1007/s11432-024-4312-9

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

A recover-then-discriminate framework for robust anomaly detection
Xing, Peng; Zhang, Dong; Tang, Jinhui; Li, Zechao
Sci China Inf Sci, 2025, 68(4): 142102

Keywords: recovery network; HOG prompt; discriminative network; self-correlation loss; anomaly detection

Cite as: Xing P, Zhang D, Tang J H, et al. A recover-then-discriminate framework for robust anomaly detection. Sci China Inf Sci, 2025, 68: 142102, doi: 10.1007/s11432-024-4291-4

Special Topic: Large Multimodal Models
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

Modality-experts coordinated adaptation for large multimodal models
Zhang, Yan; Ji, Zhong; Pang, Yanwei; Han, Jungong; Li, Xuelong
Sci China Inf Sci, 2024, 67(12): 220107

Keywords: large multimodal model; LMM; multimodal learning; vision-language pretraining; parameter-efficient fine-tuning; adapter; modality expert

Cite as: Zhang Y, Ji Z, Pang Y W, et al. Modality-experts coordinated adaptation for large multimodal models. Sci China Inf Sci, 2024, 67: 220107, doi: 10.1007/s11432-024-4234-4

图形图像 LETTER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

Scene text recognition via dual character counting-aware visual and semantic modeling network
Xiao, Ke; Zhu, Anna; Iwana, Brian Kenji; Liu, Cheng-Lin
Sci China Inf Sci, 2024, 67(3): 139101

Keywords: scene text recognition; language model; document analysis; deep learning; attention mechanism

Cite as: Xiao K, Zhu A N, K Iwana B K, et al. Scene text recognition via dual character counting-aware visual and semantic modeling network. Sci China Inf Sci, 2024, 67: 139101, doi: 10.1007/s11432-023-3935-8

图形图像 REVIEW Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Continuous representation methods, theories, and applications: an overview and perspective
Luo, Yisi; Zhao, Xile; Meng, Deyu
Sci China Inf Sci, 2026, 69(5): 151102

Keywords: continuous representation; implicit neural representation; tensor decomposition; compressed sensing; optimization; convergence and generalization

Cite as: Luo Y S, Zhao X L, Meng D Y. Continuous representation methods, theories, and applications: an overview and perspective. Sci China Inf Sci, 2026, 69: 151102, doi: 10.1007/s11432-025-4819-5

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Retrieval-and-alignment based large-scale indoor point cloud semantic segmentation
Xu, Zongyi; Huang, Xiaoshui; Yuan, Bo; Wang, Yangfu; Zhang, Qianni; Li, Weisheng; Gao, Xinbo
Sci China Inf Sci, 2024, 67(4): 142104

Keywords: point cloud semantic segmentation; large-scale indoor point clouds; point cloud alignment; overlap estimation; label transfer

Cite as: Xu Z Y, Huang X S, Yuan B, et al. Retrieval-and-alignment based large-scale indoor point cloud semantic segmentation. Sci China Inf Sci, 2024, 67: 142104, doi: 10.1007/s11432-022-3928-x

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

SpikeCV: open a continuous computer vision era
Zheng, Yajing; Zhang, Jiyuan; Zhao, Rui; Ding, Jianhao; Chen, Shiyan; Wu, Weijian; Xiong, Ruiqin; Yu, Zhaofei; Huang, Tiejun
Sci China Inf Sci, 2026, 69(3): 132106

Keywords: spike camera; datasets; spike-based algorithms; spike vision; open-source Python ecosystem

Cite as: Zheng Y J, Zhang J Y, Zhao R, et al. SpikeCV: open a continuous computer vision era. Sci China Inf Sci, 2026, 69: 132106, doi: 10.1007/s11432-023-4565-6

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Person identity shift for privacy-preserving person re-identification
Dou, Shuguang; Jiang, Xinyang; Zhao, Qingsong; Wang, Yansen; Li, Dongsheng; Zhao, Cairong
Sci China Inf Sci, 2026, 69(1): 112103

Keywords: person re-identification; person de-identification; privacy protection

Cite as: Dou S G, Jiang X Y, Zhao Q S, et al. Person identity shift for privacy-preserving person re-identification. Sci China Inf Sci, 2026, 69: 112103, doi: 10.1007/s11432-023-4583-x

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

BlindDiff: empowering degradation modeling in diffusion models for blind image super-resolution
Li, Feng; Wu, Yixuan; Liang, Zichao; Cong, Runmin; Bai, Huihui; Zhao, Yao; Wang, Meng
Sci China Inf Sci, 2026, 69(1): 112102

Keywords: blind image super-resolution; diffusion model; alternate optimization; degradation modeling

Cite as: Li F, Wu Y X, Liang Z C, et al. BlindDiff: empowering degradation modeling in diffusion models for blind image super-resolution. Sci China Inf Sci, 2026, 69: 112102, doi: 10.1007/s11432-024-4687-2

Special Topic: Large Multimodal Models (2025)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

UniAnimate: taming unified video diffusion models for consistent human image animation
Wang, Xiang; Zhang, Shiwei; Gao, Changxin; Wang, Jiayu; Zhou, Xiaoqiang; Zhang, Yingya; Yan, Luxin; Sang, Nong
Sci China Inf Sci, 2025, 68(10): 200103

Keywords: video generation; human image animation; diffusion model; large multi-modal models; temporal modeling

Cite as: Wang X, Zhang S W, Gao C X, et al. UniAnimate: taming unified video diffusion models for consistent human image animation. Sci China Inf Sci, 2025, 68: 200103, doi: 10.1007/s11432-024-4592-3

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Orpaint: a zero-shot inpainting model for oracle bone inscription rubbings with visual mamba block
Meng, Zijie; Zeng, Yuanze; Chang, Xiang; Xu, Tianshuo; Chao, Fei; Cao, Xixin; Shang, Changjing; Shen, Qiang
Sci China Inf Sci, 2025, 68(8): 189102

Keywords: oracle bone inscriptions inpainting; zero-shot inpainting; visual mamba; diffusion model; image restoration

Cite as: Meng Z J, Zeng Y Z, Chang X, et al. Orpaint: a zero-shot inpainting model for oracle bone inscription rubbings with visual mamba block. Sci China Inf Sci, 2025, 68: 189102, doi: 10.1007/s11432-024-4493-4

SCIS Selected Articles on Large Language Models (LLM)
图形图像 REVIEW Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

A review of generative models for virtual human motion driving
Wang, Pengcheng; Peng, Haolun; Mao, Siyang; Wang, Shaojiang; Tang, Jie
Sci China Inf Sci, 2025, 68(8): 181102

Keywords: virtual human motion driving; talking-face generation; human-pose generation; Text2Motion; co-speech gesture generation; large language model

Cite as: Wang P C, Peng H L, Mao S Y, et al. A review of generative models for virtual human motion driving. Sci China Inf Sci, 2025, 68: 181102, doi: 10.1007/s11432-023-4284-x

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Self-calibrated region-level regression for crowd counting
Zhu, Jiawen; Zhao, Wenda; He, You; Lu, Huchuan
Sci China Inf Sci, 2025, 68(4): 149102

Keywords: crowd counting; region-level regression; self-calibrated learning; unreliable margin attenuation; object counting

Cite as: Zhu J W, Zhao W D, He Y, et al. Self-calibrated region-level regression for crowd counting. Sci China Inf Sci, 2025, 68: 149102, doi: 10.1007/s11432-024-4326-2

图形图像 LETTER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

DcnnGrasp: towards accurate grasp pattern recognition with adaptive regularizer learning
Zhang, Xiaoqin; Huang, Ziwei; Zheng, Jingjing; Wang, Shuo; Jiang, Xianta
Sci China Inf Sci, 2024, 67(12): 229102

Keywords: grasp pattern recognition; computer vision; convolutional neural networks; deep learning; adaptive regularizer learning

Cite as: Zhang X Q, Huang Z W, Zheng J J, et al. DcnnGrasp: towards accurate grasp pattern recognition with adaptive regularizer learning. Sci China Inf Sci, 2024, 67: 229102, doi: 10.1007/s11432-022-4237-4

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Wavelet-domain feature decoupling for weakly supervised multi-object tracking
Li, Yu-Lei; Yan, Yan; Lu, Yang; Wang, Hanzi
Sci China Inf Sci, 2024, 67(8): 189102

Keywords: multi-object tracking; weakly supervised learning; feature-decoupling transformer; noisy intermediate features; well-refined embedding features

Cite as: Li Y-L, Yan Y, Lu Y, et al. Wavelet-domain feature decoupling for weakly supervised multi-object tracking. Sci China Inf Sci, 2024, 67: 189102, doi: 10.1007/s11432-022-4097-y

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Towards imbalanced motion: part-decoupling network for video portrait segmentation
Yu, Tianshu; Xia, Changqun; Li, Jia
Sci China Inf Sci, 2024, 67(7): 172104

Keywords: video portrait segmentation; imbalanced motion; unsupervised part decoupling; motion correlation; inter-frame attention

Cite as: Yu T S, Xia C Q, Li J. Towards imbalanced motion: part-decoupling network for video portrait segmentation. Sci China Inf Sci, 2024, 67: 172104, doi: 10.1007/s11432-023-4030-y

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Text-guided bidirectional mapping distillation for continual semantic segmentation
Hao, Xuze; Jiang, Xuhao; Ni, Wenqian; Tan, Weimin; Yan, Bo
Sci China Inf Sci, 2026, 69(7): 172106

Keywords: deep learning; continual learning; semantic segmentation; knowledge distillation; text embeddings

Cite as: Hao X Z, Jiang X H, Ni W Q, et al. Text-guided bidirectional mapping distillation for continual semantic segmentation. Sci China Inf Sci, 2026, 69: 172106, doi: 10.1007/s11432-024-4893-x

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

AS-CAR: adaptive topology evolution with semantic alignment for continual action recognition
Zhu, Xingyu; Shu, Xiangbo; Xu, Binqian; Zhang, Liyan; Tang, Jinhui
Sci China Inf Sci, 2026, 69(6): 169102

Keywords: class-incremental learning; skeleton action; few-shot learning; adaptive topology evolution; semantic structure

Cite as: Zhu X Y, Shu X B, Xu B Q, et al. AS-CAR: adaptive topology evolution with semantic alignment for continual action recognition. Sci China Inf Sci, 2026, 69: 169102, doi: 10.1007/s11432-025-4748-1

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Reference patch momentum distillation for open-vocabulary semantic segmentation
Liu, Yajie; Zhang, Jinjin; Liu, Qingjie; Huang, Di
Sci China Inf Sci, 2026, 69(6): 169101

Keywords: vision-language models; open-vocabulary; semantic segmentation; momentum distillation; fine-grained cross-modal alignment

Cite as: Liu Y J, Zhang J J, Liu Q J, et al. Reference patch momentum distillation for open-vocabulary semantic segmentation. Sci China Inf Sci, 2026, 69: 169101, doi: 10.1007/s11432-025-4747-y

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

DGNet: dynamic gradient-guided network for water-related optical image enhancement
Zhou, Jingchun; He, Zongxin; Zhang, Dehuan; Jiang, Qiuping; Ren, Wenqi; Fu, Xianping; Li, Xuelong
Sci China Inf Sci, 2026, 69(6): 162104

Keywords: water-related optical image; image enhancement; medium noise; feature restoration; dynamically pseudo-labeling

Cite as: Zhou J C, He Z X, Zhang D H, et al. DGNet: dynamic gradient-guided network for water-related optical image enhancement. Sci China Inf Sci, 2026, 69: 162104, doi: 10.1007/s11432-024-4929-2

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

DeMamba: AI-generated video detection on million-scale GenVideo benchmark
Chen, Haoxing; Hong, Yan; Huang, Zizheng; Xu, Zhuoer; Gu, Zhangxuan; Li, Yaohui; Lan, Jun; Zhu, Huijia; Zhang, Jianfu; Wang, Weiqiang; Li, Huaxiong
Sci China Inf Sci, 2026, 69(6): 162103

Keywords: generative model; video detection; dataset; deepfake; vision mamba

Cite as: Chen H X, Hong Y, Huang Z Z, et al. DeMamba: AI-generated video detection on million-scale GenVideo benchmark. Sci China Inf Sci, 2026, 69: 162103, doi: 10.1007/s11432-024-4894-0

图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Physiology-augmented multivariate temporal learning for adaptive simulation of electrical excitation in cardiac myocytes
Chen, Yuhang; Li, Yacong; Cui, Jiahao; Li, Shuai; Zhang, Henggui; Hao, Aimin; Zhao, Qinping
Sci China Inf Sci, 2026, 69(6): 162102

Keywords: multivariate time-series prediction; adaptive modeling; deep learning; cardiac electrical activity; cellular computational model

Cite as: Chen Y H, Li Y C, Cui J H, et al. Physiology-augmented multivariate temporal learning for adaptive simulation of electrical excitation in cardiac myocytes. Sci China Inf Sci, 2026, 69: 162102, doi: 10.1007/s11432-024-4844-4

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Prototype-guided diffusion alignment for few-shot unsupervised domain adaptation
Sun, Heyang; Geng, Chuanxing; Chen, Songcan
Sci China Inf Sci, 2026, 69(5): 159101

Keywords: prototype learning; few-shot learning; unsupervised domain adaptation; representation learning; diffusion model

Cite as: Sun H Y, Geng C X, Chen S C. Prototype-guided diffusion alignment for few-shot unsupervised domain adaptation. Sci China Inf Sci, 2026, 69: 159101, doi: 10.1007/s11432-025-4783-y

图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

FPP-Former: a transformer-based end-to-end architecture for semantic and structural reconstruction of large-scale floor plate plans
Wang, Jing; Xiong, Haoran; Yan, Zihao; Yu, Qizhi; Gong, Minglun; Huang, Hui
Sci China Inf Sci, 2026, 69(5): 152106

Keywords: floor plate plan; semantic and instance segmentation; semantic and structural reconstruction

Cite as: Wang J, Xiong H R, Yan Z H, et al. FPP-Former: a transformer-based end-to-end architecture for semantic and structural reconstruction of large-scale floor plate plans. Sci China Inf Sci, 2026, 69: 152106, doi: 10.1007/s11432-024-4834-1

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

TrackSSM: a general motion predictor using a state space model
Hu, Bin; Luo, Run; Liu, Zelin; Wang, Cheng; Liu, Wenyu
Sci China Inf Sci, 2026, 69(5): 152105

Keywords: 2D multi-object tracking; state space model; temporal motion model; hidden state; flow information

Cite as: Hu B, Luo R, Liu Z L, et al. TrackSSM: a general motion predictor using a state space model. Sci China Inf Sci, 2026, 69: 152105, doi: 10.1007/s11432-024-4849-2

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Improving image-text alignment with an optimal feature sub-space-aware similarity learning framework
Zhang, Kun; Li, Jingyu; Li, Zhe; Zhang, Huatian; Zhang, Lei; Mao, Zhendong; Zhang, Yongdong
Sci China Inf Sci, 2026, 69(5): 152104

Keywords: image-text alignment; cross-modal semantic measurement; sub-space-aware similarity learning

Cite as: Zhang K, Li J Y, Li Z, et al. Improving image-text alignment with an optimal feature sub-space-aware similarity learning framework. Sci China Inf Sci, 2026, 69: 152104, doi: 10.1007/s11432-024-4845-2

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Graph-based topology reasoning for driving scenes
Li, Tianyu; Chen, Li; Wang, Huijie; Li, Yang; Yang, Jiazhi; Geng, Xiangwei; Xu, Hang; Xu, Chunjing; Yan, Junchi; Luo, Ping; Li, Hongyang
Sci China Inf Sci, 2026, 69(5): 152103

Keywords: autonomous driving; topology reasoning; lane perception; traffic element recognition; graph

Cite as: Li T Y, Chen L, Wang H J, et al. Graph-based topology reasoning for driving scenes. Sci China Inf Sci, 2026, 69: 152103, doi: 10.1007/s11432-025-4815-9

图形图像 REVIEW Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

A literature REVIEWof literature reviews in pattern analysis and machine intelligence
Zhao, Penghai; Zhang, Xin; Cao, Jiayue; Cheng, Ming-Ming; Yang, Jian; Li, Xiang
Sci China Inf Sci, 2026, 69(5): 151101

Keywords: AI-for-Research; literature review; umbrella study; AI-generated review; bibliometrics

Cite as: Zhao P H, Zhang X, Cao J Y, et al. A literature REVIEWof literature reviews in pattern analysis and machine intelligence. Sci China Inf Sci, 2026, 69: 151101, doi: 10.1007/s11432-025-4816-6

Special Topic: Large Multimodal Models (2026)
SCIS Selected Articles on Large Language Models (LLM)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

CT-Agent: a multimodal-LLM agent for 3D CT radiology question answering
Mao, Yuren; Xu, Wenyi; Qin, Yuyang; Gao, Yunjun
Sci China Inf Sci, 2026, 69(5): 150107

Keywords: LLM agent; CT radiology question answering; visual question answering; token compression; LoRA fine-tuning

Cite as: Mao Y R, Xu W Y, Qin Y Y, et al. CT-Agent: a multimodal-LLM agent for 3D CT radiology question answering. Sci China Inf Sci, 2026, 69: 150107, doi: 10.1007/s11432-025-4818-7

Special Topic: Large Multimodal Models (2026)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Multimodal 3D object detection for autonomous driving under vision-language supervision: a contrastive-learning perspective
Lin, Chunmian; Zhang, Wenze; Chen, Yanyan; Yang, Lei; Jiang, Han; Tian, Daxin; Duan, Xuting; Zhou, Jianshan; Cao, Dongpu
Sci China Inf Sci, 2026, 69(5): 150106

Keywords: multimodal 3D detection; vision-language model; autonomous driving; contrastive learning; adapter

Cite as: Lin C M, Zhang W Z, Chen Y Y, et al. Multimodal 3D object detection for autonomous driving under vision-language supervision: a contrastive-learning perspective. Sci China Inf Sci, 2026, 69: 150106, doi: 10.1007/s11432-025-4853-5

Special Topic: Large Multimodal Models (2026)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Multimodal prior-augmented text-driven 3D human-object interaction generation
Wang, Yin; Zhang, Ziyao; Leng, Zhiying; Liu, Haitian; Li, Frederick W. B.; Li, Mu; Liang, Xiaohui
Sci China Inf Sci, 2026, 69(5): 150105

Keywords: text-driven motion generation; human-object interaction; multimodal models; diffusion model

Cite as: Wang Y, Zhang Z Y, Leng Z Y, et al. Multimodal prior-augmented text-driven 3D human-object interaction generation. Sci China Inf Sci, 2026, 69: 150105, doi: 10.1007/s11432-025-4809-7

Special Topic: Large Multimodal Models (2026)
SCIS Selected Articles on Large Language Models (LLM)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation
Li, Bo; Deng, Ningyuan; Dong, Tianyu; Wang, Shaobo; Zhu, Shaolin; Wen, Lijie
Sci China Inf Sci, 2026, 69(5): 150104

Keywords: vision-language models; multilingual image translation; large language models

Cite as: Li B, Deng N Y, Dong T Y, et al. MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation. Sci China Inf Sci, 2026, 69: 150104, doi: 10.1007/s11432-025-4914-1

Special Topic: Large Multimodal Models (2026)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Shenbi: accelerating video diffusion models with INT8-attention
Fan, Haishuang; Lv, Ao; Yan, Guihai
Sci China Inf Sci, 2026, 69(5): 150103

Keywords: video diffusion model; model quantization; INT8 attention; GPU acceleration; large multimodal models

Cite as: Fan H S, Lv A, Yan G H. Shenbi: accelerating video diffusion models with INT8-attention. Sci China Inf Sci, 2026, 69: 150103, doi: 10.1007/s11432-025-4915-9

Special Topic: Large Multimodal Models (2026)
图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Gelm: graph-based Tanimoto similarity grouping pretraining and entropy-guided conformer selection finetuning for large language models
Chen, Zhuo; Wang, Sihan; Chen, Linjiang; Du, Wenjie; Wang, Yang
Sci China Inf Sci, 2026, 69(5): 150102

Keywords: molecular relationship learning; multimodal large language model; pretraining; information entropy; DDI prediction

Cite as: Chen Z, Wang S H, Chen L J, et al. Gelm: graph-based Tanimoto similarity grouping pretraining and entropy-guided conformer selection finetuning for large language models. Sci China Inf Sci, 2026, 69: 150102, doi: 10.1007/s11432-025-4913-5

Special Topic: Large Multimodal Models (2026)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Motion-aware generative frame interpolation
Zhang, Guozhen; Zhu, Yuhan; Cui, Yutao; Zhao, Xiaotong; Ma, Kai; Wang, Limin
Sci China Inf Sci, 2026, 69(5): 150101

Keywords: video generation; diffusion model; video frame interpolation; optical flow; temporal modeling; motion guidance; latent diffusion

Cite as: Zhang G Z, Zhu Y H, Cui Y T, et al. Motion-aware generative frame interpolation. Sci China Inf Sci, 2026, 69: 150101, doi: 10.1007/s11432-025-4920-3

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Towards universal X-ray security inspection: a benchmark and stereoscopic-aware oriented prohibited item detection framework
Wang, Tianbo; Liao, Kewei; Zhang, Zhange; Ma, Yuqing; Zhi, Hongping; Liu, Aishan; Gong, Ruihao; Liu, Xianglong
Sci China Inf Sci, 2026, 69(4): 142101

Keywords: X-ray prohibited item detection; oriented object detection; benchmark; stereoscopic rotation; deformation calibration

Cite as: Wang T B, Liao K W, Zhang Z G, et al. Towards universal X-ray security inspection: a benchmark and stereoscopic-aware oriented prohibited item detection framework. Sci China Inf Sci, 2026, 69: 142101, doi: 10.1007/s11432-024-4732-x

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Augmenting and contrasting distortion for open panoramic segmentation
Jiang, Jing; Zhao, Sicheng; Zhu, Jiankun; Li, Minghui; Chen, Xi; Yao, Hongxun
Sci China Inf Sci, 2026, 69(3): 132108

Keywords: open vocabulary; panoramic semantic segmentation; distortion adaptation; contrastive learning; image-to-image transformation

Cite as: Jiang J, Zhao S C, Zhu J K, et al. Augmenting and contrasting distortion for open panoramic segmentation. Sci China Inf Sci, 2026, 69: 132108, doi: 10.1007/s11432-025-4668-8

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

UML: uncertainty-aware and mutual learning for noise-robust cross-lingual cross-modal retrieval
Liu, Yu; Chen, Haipeng; Yang, Xun; Lyu, Yingda; Wang, Meng
Sci China Inf Sci, 2026, 69(3): 132107

Keywords: cross-lingual cross-modal retrieval; noisy correspondence; uncertainty-based learning; mutual information

Cite as: Liu Y, Chen H P, Yang X, et al. UML: uncertainty-aware and mutual learning for noise-robust cross-lingual cross-modal retrieval. Sci China Inf Sci, 2026, 69: 132107, doi: 10.1007/s11432-024-4696-2

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Memory-based diverse-category single-view 3D reconstruction
Guo, Haoyu; Li, Ying; Deng, Chunyan
Sci China Inf Sci, 2026, 69(2): 122106

Keywords: single-view reconstruction; neural networks; mesh generation; neural rendering; contrastive learning

Cite as: Guo H Y, Li Y, Deng C Y. Memory-based diverse-category single-view 3D reconstruction. Sci China Inf Sci, 2026, 69: 122106, doi: 10.1007/s11432-024-4543-3

图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Weakly supervised visual-auditory fixation prediction with multigranularity perception
Wang, Guotao; Chen, Chenglizhao; Fan, Deng-Ping; Hao, Aimin; Zhao, Qinping
Sci China Inf Sci, 2026, 69(2): 122101

Keywords: weakly supervised learning; visual-audio fixation prediction; multigranularity perception

Cite as: Wang G T, Chen C L Z, Fan D-P, et al. Weakly supervised visual-auditory fixation prediction with multigranularity perception. Sci China Inf Sci, 2026, 69: 122101, doi: 10.1007/s11432-024-4744-5

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Task prior attention network for multi-task learning of dense prediction
Xu, Yangyang; Yang, Yibo; Zhang, Lefei; Du, Bo
Sci China Inf Sci, 2026, 69(1): 112108

Keywords: scene understanding; multi-task learning; dense prediction; vision transformer; task prior

Cite as: Xu Y Y, Yang Y B, Zhang L F, et al. Task prior attention network for multi-task learning of dense prediction. Sci China Inf Sci, 2026, 69: 112108, doi: 10.1007/s11432-023-4648-7

图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Prompt tuning with preference ranking for few-shot pre-trained decision transformer
Hu, Shengchao; Shen, Li; Zhang, Ya; Tao, Dacheng
Sci China Inf Sci, 2026, 69(1): 112105

Keywords: reinforcement learning; few-shot learning; preference learning; prompt tuning; ranking optimization

Cite as: Hu S C, Shen L, Zhang Y, et al. Prompt tuning with preference ranking for few-shot pre-trained decision transformer. Sci China Inf Sci, 2026, 69: 112105, doi: 10.1007/s11432-024-4545-1

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

LeCoT: revisiting network architecture for two-view correspondence pruning
Dai, Luanyuan; Du, Xiaoyu; Tang, Jinhui
Sci China Inf Sci, 2026, 69(1): 112104

Keywords: correspondence pruning; transformer; spatial and channel information; context information; relative pose estimation

Cite as: Dai L Y, Du X Y, Tang J H. LeCoT: revisiting network architecture for two-view correspondence pruning. Sci China Inf Sci, 2026, 69: 112104, doi: 10.1007/s11432-024-4555-x

图形图像 REVIEW Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Artificial intelligence for virtual reality: a review
Wang, Lili; Xu, Weiwei; Liu, Yebin; Wang, Miao; Wang, Beibei; Yang, Xubo; Xu, Lan; Tan, Zhangyao; Fan, Runze; Wang, Zijun; Wang, Chi; Zhang, Hongwen; Wen, Yijian; Yang, Haozhong; Wu, Jian; Fan, Jiahui; Wang, Hui; Zhang, Qixuan; Wang, Guoping; Wang, Yongtian; Zhao, Qinping
Sci China Inf Sci, 2026, 69(1): 111101

Keywords: virtual reality; artificial intelligence generated content; 3D Gaussian; avatar; physical simulation; interaction

Cite as: Wang L L, Xu W W, Liu Y B, et al. Artificial intelligence for virtual reality: a review. Sci China Inf Sci, 2026, 69: 111101, doi: 10.1007/s11432-024-4541-9

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Propagation rectified attack: on improving adversarial transferability
Sun, Xuxiang; Peng, Hongyu; Cheng, Gong; Han, Junwei
Sci China Inf Sci, 2025, 68(12): 222102

Keywords: computer vision; image recognition; adversarial attack; transferability; surrogate refinement

Cite as: Sun X X, Peng H Y, Cheng G, et al. Propagation rectified attack: on improving adversarial transferability. Sci China Inf Sci, 2025, 68: 222102, doi: 10.1007/s11432-024-4542-8

SCIS Selected Articles on Large Language Models (LLM)
图形图像 POSITION PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Towards multimodal graph large language model
Wang, Xin; Zhang, Zeyang; Xiao, Linxin; Chen, Haibo; Ge, Chendi; Zhu, Wenwu
Sci China Inf Sci, 2025, 68(11): 213101

Keywords: multimodal graph; large language model; foundation model; graph machine learning; multimodality

Cite as: Wang X, Zhang Z Y, Xiao L X, et al. Towards multimodal graph large language model. Sci China Inf Sci, 2025, 68: 213101, doi: 10.1007/s11432-025-4627-3

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Progressive joint distribution alignment network for cross-scene hyperspectral image classification
Xie, Zhuojun; Duan, Puhong; Kang, Xudong; Liu, Wang; Li, Shutao
Sci China Inf Sci, 2025, 68(10): 209102

Keywords: image classification; domain adaptation; hyperspectral image; remote sensing; unsupervised learning

Cite as: Xie Z J, Duan P H, Kang X D, et al. Progressive joint distribution alignment network for cross-scene hyperspectral image classification. Sci China Inf Sci, 2025, 68: 209102, doi: 10.1007/s11432-024-4586-y

Special Topic: Large Multimodal Models (2025)
图形图像 LETTER Supplementary Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

LOVECon: text-driven training-free long video editing with ControlNet
Liao, Zhenyi; Xie, Qingsong; Deng, Zhijie
Sci China Inf Sci, 2025, 68(10): 200112

Keywords: deep generative model; conditional diffusion models; text-driven editing; video editing; video interpolation

Cite as: Liao Z Y, Xie Q S, Deng Z J. LOVECon: text-driven training-free long video editing with ControlNet. Sci China Inf Sci, 2025, 68: 200112, doi: 10.1007/s11432-024-4596-1

Special Topic: Large Multimodal Models (2025)
SCIS Selected Articles on Large Language Models (LLM)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

MULTI: multimodal understanding leaderboard with text and images
Zhu, Zichen; Xu, Yang; Chen, Lu; Yang, Jingkai; Ma, Yichuan; Sun, Yiming; Wen, Hailin; Liu, Jiaqi; Cai, Jinyu; Ma, Yingzi; Zhang, Situo; Zhao, Zihan; Sun, Liangtai; Yu, Kai
Sci China Inf Sci, 2025, 68(10): 200107

Keywords: multimodal; large language model; logic reasoning; image comprehension; benchmark

Cite as: Zhu Z C, Xu Y, Chen L, et al. MULTI: multimodal understanding leaderboard with text and images. Sci China Inf Sci, 2025, 68: 200107, doi: 10.1007/s11432-024-4602-x

Special Topic: Large Multimodal Models (2025)
图形图像 RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Consistent multimodal pre-training for visual tokenization
Pan, Ting; Tang, Lulu; Wang, Xinlong; Liu, Xin; Shan, Shiguang
Sci China Inf Sci, 2025, 68(10): 200106

Keywords: foundation model; multimodal; representation learning; visual tokenization

Cite as: Pan T, Tang L L, Wang X L, et al. Consistent multimodal pre-training for visual tokenization. Sci China Inf Sci, 2025, 68: 200106, doi: 10.1007/s11432-024-4603-x

Special Topic: Large Multimodal Models (2025)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

From unimodal to multimodal: a framework for generating high-quality multimodal emotional chit-chat dialogue
Yang, Hao; Zheng, Tian; Zhao, Yanyan; Wu, Yang; Yuan, Jianhua; Che, Wanxiang; Wang, Shijin; Wei, Si; Qin, Bing
Sci China Inf Sci, 2025, 68(10): 200105

Keywords: multimodal; emotional dialogue; data generation

Cite as: Yang H, Zheng T, Zhao Y Y, et al. From unimodal to multimodal: a framework for generating high-quality multimodal emotional chit-chat dialogue. Sci China Inf Sci, 2025, 68: 200105, doi: 10.1007/s11432-024-4591-x

Special Topic: Large Multimodal Models (2025)
图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

SoulSearch: applying heuristic optimization to enhance text-to-image generation with personalized human-LMM collaboration
Shu, Yubo; Tian, Tian; Zhang, Peng; Gu, Hansu; Li, Yaqiong; Shao, Yiyang; Lu, Tun; Gu, Ning
Sci China Inf Sci, 2025, 68(10): 200104

Keywords: multimodal; text-to-image; heuristic optimization; human-AI collaboration; AI application

Cite as: Shu Y B, Tian T, Zhang P, et al. SoulSearch: applying heuristic optimization to enhance text-to-image generation with personalized human-LMM collaboration. Sci China Inf Sci, 2025, 68: 200104, doi: 10.1007/s11432-024-4564-6

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

ROLA: real-world object-centric learning with attention optimization
Tang, Qu; Wang, Haochen; Zhu, Xiangyu; Lei, Zhen; Zhang, Zhaoxiang
Sci China Inf Sci, 2025, 68(9): 192105

Keywords: object-centric learning; object discovery; slot attention; disentangled representation; computer vision

Cite as: Tang Q, Wang H C, Zhu X Y, et al. ROLA: real-world object-centric learning with attention optimization. Sci China Inf Sci, 2025, 68: 192105, doi: 10.1007/s11432-024-4342-6

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Self-knowledge distillation with dimensional history knowledge
Huang, Wenke; Ye, Mang; Shi, Zekun; Li, He; Du, Bo
Sci China Inf Sci, 2025, 68(9): 192101

Keywords: knowledge distillation; self-knowledge distillation; dimensional distribution

Cite as: Huang W K, Ye M, Shi Z K, et al. Self-knowledge distillation with dimensional history knowledge. Sci China Inf Sci, 2025, 68: 192101, doi: 10.1007/s11432-023-4283-3

图形图像 MOOP Supplementary Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

PD-NeRF: a general pseudo-depth supervision method for neural radiance fields
Gu, Jiaming; Jiang, Minchao; Lu, Xiaoyuan; Hua, Cong; Li, Hongsheng; Zhu, Guangming; Zhang, Liang
Sci China Inf Sci, 2025, 68(8): 184101

Keywords: NeRF; instant-NGP; depth supervision; Three-dimensional reconstruction; neural radiance field

Cite as: Gu J M, Jiang M C, Lu X Y, et al. PD-NeRF: a general pseudo-depth supervision method for neural radiance fields. Sci China Inf Sci, 2025, 68: 184101, doi: 10.1007/s11432-023-4512-7

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Alternating optimization for bundle adjustment with closed form solutions
Meng, Chengzhe; Jiang, Yiwen; Xu, Weiwei
Sci China Inf Sci, 2025, 68(6): 162103

Keywords: bundle adjustment; alternating optimization; 3D reconstruction; vision localization

Cite as: Meng C Z, Jiang Y W, Xu W W. Alternating optimization for bundle adjustment with closed form solutions. Sci China Inf Sci, 2025, 68: 162103, doi: 10.1007/s11432-023-4281-3

图形图像 LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Low-rank Winograd transformation for 3D convolutional neural networks
Qin, Ziran; Lin, Mingbao; Liu, Huabin; See, John; Zou, Gui; Lin, Weiyao
Sci China Inf Sci, 2025, 68(5): 159101

Keywords: 3D convolutional neural networks; network pruning; low-rank transformation; Winograd algorithm; sparse granularity

Cite as: Qin Z R, Lin M B, Liu H B, et al. Low-rank Winograd transformation for 3D convolutional neural networks. Sci China Inf Sci, 2025, 68: 159101, doi: 10.1007/s11432-023-4340-9

图形图像 MOOP Supplementary Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Event-enhanced synthetic aperture imaging
Li, Siqi; Du, Shaoyi; Yong, Jun-Hai; Gao, Yue
Sci China Inf Sci, 2025, 68(3): 134101

Keywords: event camera; synthetic aperture imaging; image de-occlusion; multi-modal fusion; event-enhanced

Cite as: Li S Q, Du S Y, Yong J-H, et al. Event-enhanced synthetic aperture imaging. Sci China Inf Sci, 2025, 68: 134101, doi: 10.1007/s11432-023-4298-8

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Distribution-flexible subset quantization for post-quantizing super-resolution networks
Zhong, Yunshan; Lin, Mingbao; Xie, Jingjing; Zhang, Yuxin; Chao, Fei; Ji, Rongrong
Sci China Inf Sci, 2025, 68(3): 132108

Keywords: super-resolution; post-training quantization; distribution-flexible; subset quantization; neural network

Cite as: Zhong Y S, Lin M B, Xie J J, et al. Distribution-flexible subset quantization for post-quantizing super-resolution networks. Sci China Inf Sci, 2025, 68: 132108, doi: 10.1007/s11432-023-4181-0

图形图像 PERSPECTIVE Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Embodied computational imaging: a new paradigm for observing and analyzing spatiotemporally ultrasensitive phenomena at multiple scales
Chen, Baoquan; Lin, Zhouchen; Xi, Peng; Liu, Yebin; Chen, Xiaodian
Sci China Inf Sci, 2024, 67(11): 216101

Keywords: computational imaging; embodied AI; generative AI; microscopic imaging; time-domain astronomy

Cite as: Chen B Q, Lin Z C, Xi P, et al. Embodied computational imaging: a new paradigm for observing and analyzing spatiotemporally ultrasensitive phenomena at multiple scales. Sci China Inf Sci, 2024, 67: 216101, doi: 10.1007/s11432-024-4121-0

图形图像 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Meta label associated loss for fine-grained visual recognition
Li, Yanchao; Xiao, Fu; Li, Hao; Li, Qun; Yu, Shui
Sci China Inf Sci, 2024, 67(6): 162102

Keywords: label associated loss; weighting noisy samples; fine-grained visual recognition; noise-tolerant learning; meta-learning

Cite as: Li Y C, Xiao F, Li H, et al. Meta label associated loss for fine-grained visual recognition. Sci China Inf Sci, 2024, 67: 162102, doi: 10.1007/s11432-023-3922-2