Rectify representation bias in vision-language models for long-tailed recognition

被引:4
|
作者
Li, Bo [1 ]
Yao, Yongqiang [2 ]
Tan, Jingru [3 ]
Gong, Ruihao [2 ]
Lu, Jianwei [4 ]
Luo, Ye [1 ]
机构
[1] Tongji Univ, 4800 Caoan Rd, Shanghai 201804, Peoples R China
[2] Sensetime Res, 1900 Hongmei Rd, Shanghai 201103, Peoples R China
[3] Cent South Univ, 932 South Lushan Rd, Changsha 410083, Hunan, Peoples R China
[4] Shanghai Univ Tradit Chinese Med, 530 Lingling Rd, Shanghai 201203, Peoples R China
基金
中国国家自然科学基金;
关键词
Long-tailed recognition; Vision-language model; Representation bias; SMOTE;
D O I
10.1016/j.neunet.2024.106134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural data typically exhibits a long-tailed distribution, presenting great challenges for recognition tasks. Due to the extreme scarcity of training instances, tail classes often show inferior performance. In this paper, we investigate the problem within the trendy visual-language (VL) framework and find that the performance bottleneck mainly arises from the recognition confusion between tail classes and their highly correlated head classes. Building upon this observation, unlike previous research primarily emphasizing class frequency in addressing long-tailed issues, we take a novel perspective by incorporating a crucial additional factor namely class correlation. Specifically, we model the representation learning procedure for each sample as two parts, i.e., a special part that learns the unique properties of its own class and a common part that learns shared characteristics among classes. By analysis, we discover that the learning process of common representation is easily biased toward head classes. Because of the bias, the network may lean towards the biased common representation as classification criteria, rather than prioritizing the crucial information encapsulated within the specific representation, ultimately leading to recognition confusion. To solve the problem, based on the VL framework, we introduce the rectification contrastive term (ReCT) to rectify the representation bias, according to semantic hints and training status. Extensive experiments on three widely-used long-tailed datasets demonstrate the effectiveness of ReCT. On iNaturalist2018, it achieves an overall accuracy of 75.4%, surpassing the baseline by 3.6 points in a ResNet-50 visual backbone.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Integrating advanced vision-language models for context recognition in risks assessment
    Rodriguez-Juan, Javier
    Ortiz-Perez, David
    Garcia-Rodriguez, Jose
    Tomas, David
    Nalepa, Grzegorz J.
    NEUROCOMPUTING, 2025, 618
  • [22] DeepUnseen: Unpredicted Event Recognition Through Integrated Vision-Language Models
    Sakaino, Hidetomo
    Gaviphat, Natnapat
    Zamora, Louie
    Insisiengmay, Alivanh
    Ningrum, Dwi Fetiria
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 48 - 50
  • [23] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [24] Vision-Language Models for Biomedical Applications
    Thapa, Surendrabikram
    Naseem, Usman
    Zhou, Luping
    Kim, Jinman
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 1 - 2
  • [25] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [26] The Neglected Tails in Vision-Language Models
    Parashar, Shubham
    Lin, Zhiqiu
    Liu, Tian
    Dong, Xiangjue
    Li, Yanan
    Ramanan, Deva
    Caverlee, James
    Kong, Shu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
  • [27] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
  • [28] Learning Prototype Classifiers for Long-Tailed Recognition
    Sharma, Saurabh
    Xian, Yongqin
    Yu, Ning
    Singh, Ambuj
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1360 - 1368
  • [29] ResLT: Residual Learning for Long-Tailed Recognition
    Cui, Jiequan
    Liu, Shu
    Tian, Zhuotao
    Zhong, Zhisheng
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3695 - 3706
  • [30] Long-Tailed Recognition via Weight Balancing
    Alshammari, Shaden
    Wang, Yu-Xiong
    Ramanan, Deva
    Kong, Shu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6887 - 6897