VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引:15
|
作者
Tian, Changyao [1 ,4 ]
Wang, Wenhai [3 ]
Zhu, Xizhou [2 ]
Dai, Jifeng [2 ]
Qiao, Yu [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] SenseTime, Hong Kong, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
[4] SenseTime Res, Hong Kong, Peoples R China
来源
关键词
Long-tailed recognition; Vision-language models; SMOTE;
D O I
10.1007/978-3-031-19806-9_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [1] Nested Collaborative Learning for Long-Tailed Visual Recognition
    Li, Jun
    Tan, Zichang
    Wan, Jun
    Lei, Zhen
    Guo, Guodong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6939 - 6948
  • [2] Probabilistic Contrastive Learning for Long-Tailed Visual Recognition
    Du, Chaoqun
    Wang, Yulin
    Song, Shiji
    Huang, Gao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 5890 - 5904
  • [3] Balanced Contrastive Learning for Long-Tailed Visual Recognition
    Zhu, Jianggang
    Wang, Zheng
    Chen, Jingjing
    Chen, Yi-Ping Phoebe
    Jiang, Yu-Gang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6898 - 6907
  • [4] Exploring the auxiliary learning for long-tailed visual recognition
    Zhang, Junjie
    Liu, Lingqiao
    Wang, Peng
    Zhang, Jian
    NEUROCOMPUTING, 2021, 449 : 303 - 314
  • [5] A Survey on Long-Tailed Visual Recognition
    Yang, Lu
    Jiang, He
    Song, Qing
    Guo, Jun
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (07) : 1837 - 1872
  • [6] A Survey on Long-Tailed Visual Recognition
    Lu Yang
    He Jiang
    Qing Song
    Jun Guo
    International Journal of Computer Vision, 2022, 130 : 1837 - 1872
  • [7] Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
    Tu, Tao
    Ping, Qing
    Thattai, Govindarajan
    Tur, Gokhan
    Natarajan, Prem
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5618 - 5627
  • [8] Balanced clustering contrastive learning for long-tailed visual recognition
    Kim, Byeong-il
    Ko, Byoung Chul
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
  • [9] General Facial Representation Learning in a Visual-Linguistic Manner
    Zheng, Yinglin
    Yang, Hao
    Zhang, Ting
    Bao, Jianmin
    Chen, Dongdong
    Huang, Yangyu
    Yuan, Lu
    Chen, Dong
    Zeng, Ming
    Wen, Fang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18676 - 18688
  • [10] Class-Difficulty Based Methods for Long-Tailed Visual Recognition
    Saptarshi Sinha
    Hiroki Ohashi
    Katsuyuki Nakamura
    International Journal of Computer Vision, 2022, 130 : 2517 - 2531