Highly Cited in 202409 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 58

Class attention network for image recognition
Cheng, Gong; Lai, Pujian; Gao, Decheng; Han, Junwei
Sci China Inf Sci, 2023, 66(3): 132105
Keywords: visual attention; class-specific attention encoding; class attention network; dictionary learning
Cite as: Cheng G, Lai P J, Gao D C, et al. Class attention network for image recognition. Sci China Inf Sci, 2023, 66(3): 132105, doi: 10.1007/s11432-021-3493-7

Highly Cited in 202409 RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 47

Fewer is more: efficient object detection in large aerial images
Xie, Xingxing; Cheng, Gong; Li, Qingyang; Miao, Shicheng; Li, Ke; Han, Junwei
Sci China Inf Sci, 2024, 67(1): 112106
Keywords: efficient object detection; large aerial images; objectness activation network
Cite as: Xie X X, Cheng G, Li Q Y, et al. Fewer is more: efficient object detection in large aerial images. Sci China Inf Sci, 2024, 67(1): 112106, doi: 10.1007/s11432-022-3718-5

PERSPECTIVE Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 22

SAM struggles in concealed scenes - empirical study on Segment Anything
Ji, Ge-Peng; Fan, Deng-Ping; Xu, Peng; Zhou, Bowen; Cheng, Ming-Ming; Van Gool, Luc
Sci China Inf Sci, 2023, 66(12): 226101
Keywords: Segment Anything; SAM; Camouflaged; Concealed Scene Understanding; Concealed Object Segmentation
Cite as: Ji G-P, Fan D-P, Xu P, et al. SAM struggles in concealed scenes - empirical study on Segment Anything. Sci China Inf Sci, 2023, 66(12): 226101, doi: 10.1007/s11432-023-3881-x

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 15

A unified pruning framework for vision transformers
Yu, Hao; Wu, Jianxin
Sci China Inf Sci, 2023, 66(7): 179101
Keywords: vision transformer; structural pruning; image classification; object detection; language modeling
Cite as: Yu H, Wu J X. A unified pruning framework for vision transformers. Sci China Inf Sci, 2023, 66(7): 179101, doi: 10.1007/s11432-022-3646-6

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 12

ViGT: proposal-free video grounding with a learnable token in the transformer
Li, Kun; Guo, Dan; Wang, Meng
Sci China Inf Sci, 2023, 66(10): 202102
Keywords: video grounding; temporal sentence grounding; boundary regression; token learning; proposal-free
Cite as: Li K, Guo D, Wang M. ViGT: proposal-free video grounding with a learnable token in the transformer. Sci China Inf Sci, 2023, 66(10): 202102, doi: 10.1007/s11432-022-3783-3

REVIEW Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 11

Survey on leveraging pre-trained generative adversarial networks for image editing and restoration
Liu, Ming; Wei, Yuxiang; Wu, Xiaohe; Zuo, Wangmeng; Zhang, Lei
Sci China Inf Sci, 2023, 66(5): 151101
Keywords: survey; generative adversarial networks; pre-trained models; image editing; image restoration
Cite as: Liu M, Wei Y X, Wu X H, et al. Survey on leveraging pre-trained generative adversarial networks for image editing and restoration. Sci China Inf Sci, 2023, 66(5): 151101, doi: 10.1007/s11432-022-3679-0

Special Topic: Deep Learning for Computer Vision (2023)
SCIS Selected Articles on Deep Learning for Computer Vision (DLCV)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 9

Sparse spatial transformers for few-shot learning
Chen, Haoxing; Li, Huaxiong; Li, Yaohui; Chen, Chunlin
Sci China Inf Sci, 2023, 66(11): 210102
Keywords: few-shot learning; transformer; metric-learning; cross-attention
Cite as: Chen H X, Li H X, Li Y H, et al. Sparse spatial transformers for few-shot learning. Sci China Inf Sci, 2023, 66(11): 210102, doi: 10.1007/s11432-022-3700-8

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 8

Augmented FCN: rethinking context modeling for semantic segmentation
Zhang, Dong; Zhang, Liyan; Tang, Jinhui
Sci China Inf Sci, 2023, 66(4): 142105
Keywords: semantic segmentation; context modeling; long-range dependencies; attention mechanism
Cite as: Zhang D, Zhang L Y, Tang J H. Augmented FCN: rethinking context modeling for semantic segmentation. Sci China Inf Sci, 2023, 66(4): 142105, doi: 10.1007/s11432-021-3590-1

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 8

Deep graph learning for spatially-varying indoor lighting prediction
Bai, Jiayang; Guo, Jie; Wang, Chenchen; Chen, Zhenyu; He, Zhen; Yang, Shan; Yu, Piaopiao; Zhang, Yan; Guo, Yanwen
Sci China Inf Sci, 2023, 66(3): 132106
Keywords: lighting; graph learning; augmented reality; spherical Gaussians; rendering
Cite as: Bai J Y, Guo J, Wang C C, et al. Deep graph learning for spatially-varying indoor lighting prediction. Sci China Inf Sci, 2023, 66(3): 132106, doi: 10.1007/s11432-022-3576-9

Special Topic: Deep Learning for Computer Vision (2023)
SCIS Selected Articles on Deep Learning for Computer Vision (DLCV)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 7

A meaningful learning method for zero-shot semantic segmentation
Liu, Xianglong; Bai, Shihao; An, Shan; Wang, Shuo; Liu, Wei; Zhao, Xiaowei; Ma, Yuqing
Sci China Inf Sci, 2023, 66(11): 210103
Keywords: meaningful learning; zero-shot learning; semantic segmentation; conjugate conceptual correlation; fast-slow conceptual modulator
Cite as: Liu X L, Bai S H, An S, et al. A meaningful learning method for zero-shot semantic segmentation. Sci China Inf Sci, 2023, 66(11): 210103, doi: 10.1007/s11432-022-3748-5

RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 7

On learning the right attention point for feature enhancement
Lin, Liqiang; Huang, Pengdi; Fu, Chi-Wing; Xu, Kai; Zhang, Hao; Huang, Hui
Sci China Inf Sci, 2023, 66(1): 112107
Keywords: point convolution; feature enhancement; attention point; deep neural network
Cite as: Lin L Q, Huang P D, Fu C-W, et al. On learning the right attention point for feature enhancement. Sci China Inf Sci, 2023, 66(1): 112107, doi: 10.1007/s11432-021-3431-9

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 6

Siamese transformer with hierarchical concept embedding for fine-grained image recognition
Lyu, Yilin; Jing, Liping; Wang, Jiaqi; Guo, Mingzhe; Wang, Xinyue; Yu, Jian
Sci China Inf Sci, 2023, 66(3): 132107
Keywords: fine-grained image recognition; transformer; hierarchical concept embedding; adaptive sampling; Siamese network
Cite as: Lyu Y L, Jing L P, Wang J Q, et al. Siamese transformer with hierarchical concept embedding for fine-grained image recognition. Sci China Inf Sci, 2023, 66(3): 132107, doi: 10.1007/s11432-022-3586-y

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

RGB oralscan video-based orthodontic treatment monitoring
Tian, Yan; Fu, Hanshi; Wang, Hao; Liu, Yuqi; Xu, Zhaocheng; Chen, Hong; Li, Jianyuan; Wang, Ruili
Sci China Inf Sci, 2024, 67(1): 112107
Keywords: digital dentistry; object 6D pose estimation; deep learning; computer vision
Cite as: Tian Y, Fu H S, Wang H, et al. RGB oralscan video-based orthodontic treatment monitoring. Sci China Inf Sci, 2024, 67(1): 112107, doi: 10.1007/s11432-023-3847-x

Special Topic: Deep Learning for Computer Vision (2023)
SCIS Selected Articles on Deep Learning for Computer Vision (DLCV)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

Confidence-weighted mutual supervision on dual networks for unsupervised cross-modality image segmentation
Chen, Yajie; Yang, Xin; Bai, Xiang
Sci China Inf Sci, 2023, 66(11): 210104
Keywords: domain adaptation; pseudo label; mutual supervision; cross-modality; image segmentation
Cite as: Chen Y J, Yang X, Bai X. Confidence-weighted mutual supervision on dual networks for unsupervised cross-modality image segmentation. Sci China Inf Sci, 2023, 66(11): 210104, doi: 10.1007/s11432-022-3871-0

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

Unpaired remote sensing image super-resolution with content-preserving weak supervision neural network
Wu, Jie; Cong, Runmin; Fang, Leyuan; Guo, Chunle; Zhang, Bob; Ghamisi, Pedram
Sci China Inf Sci, 2023, 66(1): 119105
Keywords: deep learning; remote sensing images; super-resolution; unpaired training; weak supervision
Cite as: Wu J, Cong R M, Fang L Y, et al. Unpaired remote sensing image super-resolution with content-preserving weak supervision neural network. Sci China Inf Sci, 2023, 66(1): 119105, doi: 10.1007/s11432-021-3575-1

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 5

TinyDet: accurately detecting small objects within 1 GFLOPs
Chen, Shaoyu; Cheng, Tianheng; Fang, Jiemin; Zhang, Qian; Li, Yuan; Liu, Wenyu; Wang, Xinggang
Sci China Inf Sci, 2023, 66(1): 119102
Keywords: small object detection; lightweight networ; generic object detection; Tiny AI; high resolution
Cite as: Chen S Y, Cheng T H, Fang J M, et al. TinyDet: accurately detecting small objects within 1 GFLOPs. Sci China Inf Sci, 2023, 66(1): 119102, doi: 10.1007/s11432-021-3504-4

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 4

Adaptive structured sparse multiview canonical correlation analysis for multimodal brain imaging association identification
Du, Lei; Wang, Huiai; Zhang, Jin; Zhang, Shu; Guo, Lei; Han, Junwei
Sci China Inf Sci, 2023, 66(4): 142106
Keywords: multimodal brain imaging correlation; sparse multiview canonical correlation analysis; task imbalance; adaptive loss balancing; graph-group penalty
Cite as: Du L, Wang H A, Zhang J, et al. Adaptive structured sparse multiview canonical correlation analysis for multimodal brain imaging association identification. Sci China Inf Sci, 2023, 66(4): 142106, doi: 10.1007/s11432-021-3589-5

Special Topic: Deep Learning for Computer Vision (2023)
SCIS Selected Articles on Deep Learning for Computer Vision (DLCV)
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

An end-to-end network for co-saliency detection in one single image
Yue, Yuanhao; Zou, Qin; Yu, Hongkai; Wang, Qian; Wang, Zhongyuan; Wang, Song
Sci China Inf Sci, 2023, 66(11): 210101
Keywords: saliency detection; convolutional neural network; regional feature mapping; co-saliency detection; deep learning
Cite as: Yue Y H, Zou Q, Yu H K, et al. An end-to-end network for co-saliency detection in one single image. Sci China Inf Sci, 2023, 66(11): 210101, doi: 10.1007/s11432-022-3686-1

MOOP Supplementary Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 3

ASPPR: active single-image piecewise planar 3D reconstruction based on geometric priors
Wang, Wei; Dong, Qiulei; Hu, Zhanyi
Sci China Inf Sci, 2023, 66(7): 174101
Keywords: single-image reconstruction; geometric prior; user interaction; multi-plane optimization; convolutional neural network
Cite as: Wang W, Dong Q L, Hu Z Y. ASPPR: active single-image piecewise planar 3D reconstruction based on geometric priors. Sci China Inf Sci, 2023, 66(7): 174101, doi: 10.1007/s11432-022-3631-6

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Rethinking attribute localization for zero-shot learning
Chen, Shuhuang; Chen, Shiming; Xie, Guo-Sen; Shu, Xiangbo; You, Xinge; Li, Xuelong
Sci China Inf Sci, 2024, 67(7): 172103
Keywords: zero-shot learning; attention mechanism; attribute localization; image classification
Cite as: Chen S H, Chen S M, Xie G-S, et al. Rethinking attribute localization for zero-shot learning. Sci China Inf Sci, 2024, 67(7): 172103, doi: 10.1007/s11432-023-4051-9

RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration
Yang, Bangbang; Huang, Zhaoyang; Li, Yijin; Zhou, Han; Li, Hongsheng; Zhang, Guofeng; Bao, Hujun
Sci China Inf Sci, 2023, 66(7): 172101
Keywords: point cloud registration; cross-modal feature fusion; multi-view feature fusion; computer vision; deep learning
Cite as: Yang B B, Huang Z Y, Li Y J, et al. Hybrid3D: learning 3D hybrid features with point clouds and multi-view images for point cloud registration. Sci China Inf Sci, 2023, 66(7): 172101, doi: 10.1007/s11432-022-3604-6

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Improving pseudo-labeling with reliable inter-camera distance encouragement for unsupervised person re-identification
Chen, Yiyu; Fan, Zheyi; Chen, Shuni; Zhu, Yixuan
Sci China Inf Sci, 2023, 66(5): 152103
Keywords: person re-identification; noise label; camera variation; clustering; deep learning
Cite as: Chen Y Y, Fan Z Y, Chen S N, et al. Improving pseudo-labeling with reliable inter-camera distance encouragement for unsupervised person re-identification. Sci China Inf Sci, 2023, 66(5): 152103, doi: 10.1007/s11432-022-3628-y

RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Reflectance edge guided networks for detail-preserving intrinsic image decomposition
Li, Quewei; Guo, Jie; Wu, Zhengyi; Fei, Yang; Guo, Yanwen
Sci China Inf Sci, 2023, 66(2): 122105
Keywords: intrinsic image decomposition; detail-preserving; reflectance edges
Cite as: Li Q W, Guo J, Wu Z Y, et al. Reflectance edge guided networks for detail-preserving intrinsic image decomposition. Sci China Inf Sci, 2023, 66(2): 122105, doi: 10.1007/s11432-021-3481-3

RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 2

Model-guided 3D stitching for augmented virtual environment
Zhou, Zhong; Meng, Ming; Zhou, Yi; Zhu, Zhe; You, Jingdi
Sci China Inf Sci, 2023, 66(1): 112106
Keywords: 3D stitching; video model; multiple video visualization; augmented virtual environment
Cite as: Zhou Z, Meng M, Zhou Y, et al. Model-guided 3D stitching for augmented virtual environment. Sci China Inf Sci, 2023, 66(1): 112106, doi: 10.1007/s11432-021-3323-2

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Saliency-guided meta-hallucinator for few-shot learning
Zhang, Hongguang; Liu, Chun; Wang, Jiandong; Ma, Linru; Koniusz, Piotr; Torr, Philip H. S.; Yang, Lin
Sci China Inf Sci, 2024, 67(10): 202103
Keywords: few-shot learning; saliency detection; object recognition; anomaly detection; computer vision
Cite as: Zhang H G, Liu C, Wang J D, et al. Saliency-guided meta-hallucinator for few-shot learning. Sci China Inf Sci, 2024, 67(10): 202103, doi: 10.1007/s11432-023-4113-1

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

SAM3D: zero-shot 3D object detection via the segment anything model
Zhang, Dingyuan; Liang, Dingkang; Yang, Hongcheng; Zou, Zhikang; Ye, Xiaoqing; Liu, Zhe; Bai, Xiang
Sci China Inf Sci, 2024, 67(4): 149101
Keywords: Zero-shot 3D Object Detection; Foundation Model; Segment Anything Model; BEV Perception; Segmentation
Cite as: Zhang D Y, Liang D K, Yang H C, et al. SAM3D: zero-shot 3D object detection via the segment anything model. Sci China Inf Sci, 2024, 67(4): 149101, doi: 10.1007/s11432-023-3943-6

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Retrieval-and-alignment based large-scale indoor point cloud semantic segmentation
Xu, Zongyi; Huang, Xiaoshui; Yuan, Bo; Wang, Yangfu; Zhang, Qianni; Li, Weisheng; Gao, Xinbo
Sci China Inf Sci, 2024, 67(4): 142104
Keywords: point cloud semantic segmentation; large-scale indoor point clouds; point cloud alignment; overlap estimation; label transfer
Cite as: Xu Z Y, Huang X S, Yuan B, et al. Retrieval-and-alignment based large-scale indoor point cloud semantic segmentation. Sci China Inf Sci, 2024, 67(4): 142104, doi: 10.1007/s11432-022-3928-x

LETTER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Scene text recognition via dual character counting-aware visual and semantic modeling network
Xiao, Ke; Zhu, Anna; Iwana, Brian Kenji; Liu, Cheng-Lin
Sci China Inf Sci, 2024, 67(3): 139101
Keywords: scene text recognition; language model; document analysis; deep learning; attention mechanism
Cite as: Xiao K, Zhu A N, K Iwana B K, et al. Scene text recognition via dual character counting-aware visual and semantic modeling network. Sci China Inf Sci, 2024, 67(3): 139101, doi: 10.1007/s11432-023-3935-8

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

NeuralReshaper: single-image human-body retouching with deep neural networks
Chen, Beijia; Shen, Yuefan; Fu, Hongbo; Chen, Xiang; Zhou, Kun; Zheng, Youyi
Sci China Inf Sci, 2023, 66(9): 199101
Keywords: image manipulation; human-body reshaping; deep neural networks; generative adversarial network; self-supervised learning
Cite as: Chen B J, Shen Y F, Fu H B, et al. NeuralReshaper: single-image human-body retouching with deep neural networks. Sci China Inf Sci, 2023, 66(9): 199101, doi: 10.1007/s11432-022-3675-1

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Difficulty-aware prior-guided hierarchical network for adaptive segmentation of breast tumors
Hussain, Sumaira; Xi, Xiaoming; Ullah, Inam; Naim, Syeda Wajiha; Shaheed, Kashif; Tian, Cuihuan; Yin, Yilong
Sci China Inf Sci, 2023, 66(2): 122104
Keywords: breast tumor; ultrasound image segmentation; deep neural network; difficulty-awareness
Cite as: Hussain S, Xi X M, Ullah I, et al. Difficulty-aware prior-guided hierarchical network for adaptive segmentation of breast tumors. Sci China Inf Sci, 2023, 66(2): 122104, doi: 10.1007/s11432-021-3340-y

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

BiTGAN: bilateral generative adversarial networks for Chinese ink wash painting style transfer
He, Xiao; Zhu, Mingrui; Wang, Nannan; Wang, Xiaoyu; Gao, Xinbo
Sci China Inf Sci, 2023, 66(1): 119104
Keywords: Chinese ink wash painting; image style transfer; image-to-image translation; generative adversarial networks; GAN; bilateral-generator
Cite as: He X, Zhu M R, Wang N N, et al. BiTGAN: bilateral generative adversarial networks for Chinese ink wash painting style transfer. Sci China Inf Sci, 2023, 66(1): 119104, doi: 10.1007/s11432-022-3541-x

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Learning cross-modal interaction for RGB-T tracking
Xu, Chunyan; Cui, Zhen; Wang, Chaoqun; Zhou, Chuanwei; Yang, Jian
Sci China Inf Sci, 2023, 66(1): 119103
Keywords: RGB-T tracking; cross-modal interaction; pixel-level correlation; relation-level correlation; visual object tracking
Cite as: Xu C Y, Cui Z, Wang C Q, et al. Learning cross-modal interaction for RGB-T tracking. Sci China Inf Sci, 2023, 66(1): 119103, doi: 10.1007/s11432-021-3518-y

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 1

Image co-segmentation based on pyramid features cross-correlation network
Chen, Jia; Chen, Yasong; Li, Weihao; Liu, Zhi; Liu, Sannyuya; Yang, Zongkai
Sci China Inf Sci, 2023, 66(1): 119101
Keywords: image co-segmentation; pyramid features; deep learning; pair-wise manner; VGG16
Cite as: Chen J, Chen Y S, Li W H, et al. Image co-segmentation based on pyramid features cross-correlation network. Sci China Inf Sci, 2023, 66(1): 119101, doi: 10.1007/s11432-021-3515-6

RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar

ControlVideo: conditional control for one-shot text-driven video editing and beyond
Zhao M, Wang R Z, Bao F, et al
Sci China Inf Sci, 2025, 68(3): 132107
Keywords: diffusion models; controllable generation; text-driven editing; video editing; long video editing
Cite as: Zhao M, Wang R Z, Bao F, et al. ControlVideo: conditional control for one-shot text-driven video editing and beyond. Sci China Inf Sci, 2025, 68(3): 132107, doi: 10.1007/s11432-023-4184-4

MOOP Webpage Webpage-cn SpringerLink Google Scholar

Situation-adaptive neural network for fast pre-computing image enhancement
Li X Y, Duan H Y, Wang J, et al
Sci China Inf Sci, 2025, 68(2): 124101
Keywords: photonic computing; image enhancement; look up table; pre-computing; deep learning
Cite as: Li X Y, Duan H Y, Wang J, et al. Situation-adaptive neural network for fast pre-computing image enhancement. Sci China Inf Sci, 2025, 68(2): 124101, doi: 10.1007/s11432-024-4166-y

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

BEV-Locator: an end-to-end visual semantic localization network using multi-view images
Zhang Z H, Xu M, Zhou W Q, et al
Sci China Inf Sci, 2025, 68(2): 122106
Keywords: visual localization; semantic map; bird-eye-view; transformer; pose estimation
Cite as: Zhang Z H, Xu M, Zhou W Q, et al. BEV-Locator: an end-to-end visual semantic localization network using multi-view images. Sci China Inf Sci, 2025, 68(2): 122106, doi: 10.1007/s11432-023-4114-6

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar

Aligning enhanced feature representation for generalized zero-shot learning
Fang Z Y, Zhu X B, Yang C, et al
Sci China Inf Sci, 2025, 68(2): 122102
Keywords: generalized zero-shot learning; gated attention mechanism; contrastive learning; multi-modal alignment
Cite as: Fang Z Y, Zhu X B, Yang C, et al. Aligning enhanced feature representation for generalized zero-shot learning. Sci China Inf Sci, 2025, 68(2): 122102, doi: 10.1007/s11432-023-4174-4

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Multi-receptive field interaction network for shape from polarization
Peng, Yini; Liu, Rui; Zhang, Zhiyuan; Wang, Zhongyuan; Ma, Jiayi; Tian, Xin
Sci China Inf Sci, 2025, 68(1): 119102
Keywords: shape from polarization; 3D reconstruction; deep learning; multi-receptive field interaction; surface normal
Cite as: Peng Y N, Liu R, Zhang Z Y, et al. Multi-receptive field interaction network for shape from polarization. Sci China Inf Sci, 2025, 68(1): 119102, doi: 10.1007/s11432-024-4212-2

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection
Zhang, Xiaoqin; Yu, Zhenni; Zhao, Li; Fan, Deng-Ping; Xiao, Guobao
Sci China Inf Sci, 2025, 68(1): 112104
Keywords: segment anything model; camouflaged object detection; boundary; prompt
Cite as: Zhang X Q, Yu Z N, Zhao L, et al. COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection. Sci China Inf Sci, 2025, 68(1): 112104, doi: 10.1007/s11432-024-4233-9

LETTER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

DcnnGrasp: towards accurate grasp pattern recognition with adaptive regularizer learning
Zhang, Xiaoqin; Huang, Ziwei; Zheng, Jingjing; Wang, Shuo; Jiang, Xianta
Sci China Inf Sci, 2024, 67(12): 229102
Keywords: grasp pattern recognition; computer vision; convolutional neural networks; deep learning; adaptive regularizer learning
Cite as: Zhang X Q, Huang Z W, Zheng J J, et al. DcnnGrasp: towards accurate grasp pattern recognition with adaptive regularizer learning. Sci China Inf Sci, 2024, 67(12): 229102, doi: 10.1007/s11432-022-4237-4

Special Topic: Large Multimodal Models
LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

ChemDFM-X: towards large multimodal model for chemistry
Zhao, Zihan; Chen, Bo; Li, Jingpiao; Chen, Lu; Wen, Liyang; Wang, Pengyu; Zhu, Zichen; Zhang, Danyang; Li, Yansi; Dai, Zhongyang; Chen, Xin; Yu, Kai
Sci China Inf Sci, 2024, 67(12): 220109
Keywords: LMM; AI for Science; Instruction-Tuning; Cross-Modality; Chemistry
Cite as: Zhao Z H, Chen B, Li J P, et al. ChemDFM-X: towards large multimodal model for chemistry. Sci China Inf Sci, 2024, 67(12): 220109, doi: 10.1007/s11432-024-4243-0

Special Topic: Large Multimodal Models
LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

COMET: "cone of experience" enhanced large multimodal model for mathematical problem generation
Liu, Sannyuya; Feng, Jintian; Yang, Zongkai; Luo, Yawei; Wan, Qian; Shen, Xiaoxuan; Sun, Jianwen
Sci China Inf Sci, 2024, 67(12): 220108
Keywords: mathematical problem generation; mathematical problem solving; large multimodal model; LMM; educational application; smart education
Cite as: Liu S N Y, Feng J T, Yang Z K, et al. COMET: "cone of experience" enhanced large multimodal model for mathematical problem generation. Sci China Inf Sci, 2024, 67(12): 220108, doi: 10.1007/s11432-024-4242-0

Special Topic: Large Multimodal Models
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Modality-experts coordinated adaptation for large multimodal models
Zhang, Yan; Ji, Zhong; Pang, Yanwei; Han, Jungong; Li, Xuelong
Sci China Inf Sci, 2024, 67(12): 220107
Keywords: large multimodal model; LMM; multimodal learning; vision-language pretraining; parameter-efficient fine-tuning; adapter; modality expert
Cite as: Zhang Y, Ji Z, Pang Y W, et al. Modality-experts coordinated adaptation for large multimodal models. Sci China Inf Sci, 2024, 67(12): 220107, doi: 10.1007/s11432-024-4234-4

Special Topic: Large Multimodal Models
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding
Feng, Hao; Liu, Qi; Liu, Hao; Tang, Jingqun; Zhou, Wengang; Li, Houqiang; Huang, Can
Sci China Inf Sci, 2024, 67(12): 220106
Keywords: document understanding; large multimodal model; LMM; OCR-free; high-resolution; frequency
Cite as: Feng H, Liu Q, Liu H, et al. DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding. Sci China Inf Sci, 2024, 67(12): 220106, doi: 10.1007/s11432-024-4250-y

Special Topic: Large Multimodal Models
RESEARCH PAPER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Woodpecker: hallucination correction for multimodal large language models
Yin, Shukang; Fu, Chaoyou; Zhao, Sirui; Xu, Tong; Wang, Hao; Sui, Dianbo; Shen, Yunhang; Li, Ke; Sun, Xing; Chen, Enhong
Sci China Inf Sci, 2024, 67(12): 220105
Keywords: multimodal learning; multimodal large language models; hallucination correction; large language models; vision and language; LMM
Cite as: Yin S K, Fu C Y, Zhao S R, et al. Woodpecker: hallucination correction for multimodal large language models. Sci China Inf Sci, 2024, 67(12): 220105, doi: 10.1007/s11432-024-4251-x

Special Topic: Large Multimodal Models
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity
Liu, Yangzhou; Cao, Yue; Gao, Zhangwei; Wang, Weiyun; Chen, Zhe; Wang, Wenhai; Tian, Hao; Lu, Lewei; Zhu, Xizhou; Lu, Tong; Qiao, Yu; Dai, Jifeng
Sci China Inf Sci, 2024, 67(12): 220103
Keywords: instruction tuning; multi-modal; multi-domain; dataset; vision large language model; LMM
Cite as: Liu Y Z, Cao Y, Gao Z W, et al. MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity. Sci China Inf Sci, 2024, 67(12): 220103, doi: 10.1007/s11432-024-4187-3

Special Topic: Large Multimodal Models
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

OCRBench: on the hidden mystery of OCR in large multimodal models
Liu, Yuliang; Li, Zhang; Huang, Mingxin; Yang, Biao; Yu, Wenwen; Li, Chunyuan; Yin, Xu-Cheng; Liu, Cheng-Lin; Jin, Lianwen; Bai, Xiang
Sci China Inf Sci, 2024, 67(12): 220102
Keywords: large multimodal model; LMM; OCR; text recognition; scene text-centric VQA; document-oriented VQA; key information extraction; handwritten mathematical expression recognition
Cite as: Liu Y L, Li Z, Huang M X, et al. OCRBench: on the hidden mystery of OCR in large multimodal models. Sci China Inf Sci, 2024, 67(12): 220102, doi: 10.1007/s11432-024-4235-6

Special Topic: Large Multimodal Models
RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites
Chen, Zhe; Wang, Weiyun; Tian, Hao; Ye, Shenglong; Gao, Zhangwei; Cui, Erfei; Tong, Wenwen; Hu, Kongzhi; Luo, Jiapeng; Ma, Zheng; Ma, Ji; Wang, Jiaqi; Dong, Xiaoyi; Yan, Hang; Guo, Hewei; He, Conghui; Shi, Botian; Jin, Zhenjiang; Xu, Chao; Wang, Bin; Wei, Xingjian; Li, Wei; Zhang, Wenjian; Zhang, Bo; Cai, Pinlong; Wen, Licheng; Yan, Xiangchao; Dou, Min; Lu, Lewei; Zhu, Xizhou; Lu, Tong; Lin, Dahua; Qiao, Yu; Dai, Jifeng; Wang, Wenhai
Sci China Inf Sci, 2024, 67(12): 220101
Keywords: multimodal model; open-source; vision encoder; dynamic resolution; bilingual dataset; LMM
Cite as: Chen Z, Wang W Y, Tian H, et al. How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites. Sci China Inf Sci, 2024, 67(12): 220101, doi: 10.1007/s11432-024-4231-5

PERSPECTIVE Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Embodied computational imaging: a new paradigm for observing and analyzing spatiotemporally ultrasensitive phenomena at multiple scales
Chen, Baoquan; Lin, Zhouchen; Xi, Peng; Liu, Yebin; Chen, Xiaodian
Sci China Inf Sci, 2024, 67(11): 216101
Keywords: computational imaging; embodied AI; generative AI; microscopic imaging; time-domain astronomy
Cite as: Chen B Q, Lin Z C, Xi P, et al. Embodied computational imaging: a new paradigm for observing and analyzing spatiotemporally ultrasensitive phenomena at multiple scales. Sci China Inf Sci, 2024, 67(11): 216101, doi: 10.1007/s11432-024-4121-0

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

PointSmile: point self-supervised learning via curriculum mutual information
Li, Xin; Wei, Mingqiang; Chen, Songcan
Sci China Inf Sci, 2024, 67(11): 212104
Keywords: PointSmile; self-supervised learning; curriculum mutual information; point cloud; representation learning
Cite as: Li X, Wei M Q, Chen S C. PointSmile: point self-supervised learning via curriculum mutual information. Sci China Inf Sci, 2024, 67(11): 212104, doi: 10.1007/s11432-023-4085-9

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Robust video question answering via contrastive cross-modality representation learning
Yang, Xun; Zeng, Jianming; Guo, Dan; Wang, Shanshan; Dong, Jianfeng; Wang, Meng
Sci China Inf Sci, 2024, 67(10): 202104
Keywords: video question answering; cross-modality fusion; contrastive learning; cross-media reasoning
Cite as: Yang X, Zeng J M, Guo D, et al. Robust video question answering via contrastive cross-modality representation learning. Sci China Inf Sci, 2024, 67(10): 202104, doi: 10.1007/s11432-023-4084-6

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Relative difficulty distillation for semantic segmentation
Liang, Dong; Sun, Yue; Du, Yun; Chen, Songcan; Huang, Sheng-Jun
Sci China Inf Sci, 2024, 67(9): 192105
Keywords: knowledge distillation; semantic segmentation; relative difficulty; sample weighting; prediction discrepancy
Cite as: Liang D, Sun Y, Du Y, et al. Relative difficulty distillation for semantic segmentation. Sci China Inf Sci, 2024, 67(9): 192105, doi: 10.1007/s11432-023-4061-2

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Wavelet-domain feature decoupling for weakly supervised multi-object tracking
Li, Yu-Lei; Yan, Yan; Lu, Yang; Wang, Hanzi
Sci China Inf Sci, 2024, 67(8): 189102
Keywords: multi-object tracking; weakly supervised learning; feature-decoupling transformer; noisy intermediate features; well-refined embedding features
Cite as: Li Y-L, Yan Y, Lu Y, et al. Wavelet-domain feature decoupling for weakly supervised multi-object tracking. Sci China Inf Sci, 2024, 67(8): 189102, doi: 10.1007/s11432-022-4097-y

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

SeeMore: a spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation
Ma, Yuqing; Liu, Wei; Gao, Yajun; Yuan, Yang; Bai, Shihao; Qin, Haotong; Liu, Xianglong
Sci China Inf Sci, 2024, 67(8): 182104
Keywords: spatiotemporal predictive learning; knowledge transfer; bidirectional distillation network; level-specific meta-adapter; coarse-to-fine training
Cite as: Ma Y Q, Liu W, Gao Y J, et al. SeeMore: a spatiotemporal predictive model with bidirectional distillation and level-specific meta-adaptation. Sci China Inf Sci, 2024, 67(8): 182104, doi: 10.1007/s11432-022-3859-8

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Towards imbalanced motion: part-decoupling network for video portrait segmentation
Yu, Tianshu; Xia, Changqun; Li, Jia
Sci China Inf Sci, 2024, 67(7): 172104
Keywords: video portrait segmentation; imbalanced motion; unsupervised part decoupling; motion correlation; inter-frame attention
Cite as: Yu T S, Xia C Q, Li J. Towards imbalanced motion: part-decoupling network for video portrait segmentation. Sci China Inf Sci, 2024, 67(7): 172104, doi: 10.1007/s11432-023-4030-y

LETTER Supplementary Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Pyramid-resolution person restoration for cross-resolution person re-identification
Peng, Chunlei; Wang, Bo; Liu, Decheng; Wang, Nannan; Gao, Xinbo
Sci China Inf Sci, 2024, 67(6): 169101
Keywords: pyramid-resolution; cross-resolution person ReID; image restoration; feature distance fusion; multi-resolution person ReID
Cite as: Peng C L, Wang B, Liu D C, et al. Pyramid-resolution person restoration for cross-resolution person re-identification. Sci China Inf Sci, 2024, 67(6): 169101, doi: 10.1007/s11432-023-4023-y

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

Meta label associated loss for fine-grained visual recognition
Li, Yanchao; Xiao, Fu; Li, Hao; Li, Qun; Yu, Shui
Sci China Inf Sci, 2024, 67(6): 162102
Keywords: label associated loss; weighting noisy samples; fine-grained visual recognition; noise-tolerant learning; meta-learning
Cite as: Li Y C, Xiao F, Li H, et al. Meta label associated loss for fine-grained visual recognition. Sci China Inf Sci, 2024, 67(6): 162102, doi: 10.1007/s11432-023-3922-2

MOOP Supplementary Video Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

A unified user behavior model for trajectory-based tasks with different types of path constraints
Zhang, Hao; Huang, Jin; Tu, Huawei; Tian, Feng; Dai, Guozhong; Wang, Hongan
Sci China Inf Sci, 2023, 66(10): 204101
Keywords: human-computer interaction; user behavior modeling; artificial potential field; trajectory-based task; trajectory predicting
Cite as: Zhang H, Huang J, Tu H W, et al. A unified user behavior model for trajectory-based tasks with different types of path constraints. Sci China Inf Sci, 2023, 66(10): 204101, doi: 10.1007/s11432-023-3779-5

RESEARCH PAPER Webpage Webpage-cn SpringerLink Google Scholar Cited in SCI: 0

An energy constraint position-based dynamics with corrected SPH kernel
Cao, Wei; Lyu, Luan; Yang, Zhixin; Wu, Enhua
Sci China Inf Sci, 2023, 66(1): 112108
Keywords: position-based dynamics; deformable solids; continuum mechanics
Cite as: Cao W, Lyu L, Yang Z X, et al. An energy constraint position-based dynamics with corrected SPH kernel. Sci China Inf Sci, 2023, 66(1): 112108, doi: 10.1007/s11432-021-3464-2