VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引:15
|
作者
Tian, Changyao [1 ,4 ]
Wang, Wenhai [3 ]
Zhu, Xizhou [2 ]
Dai, Jifeng [2 ]
Qiao, Yu [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] SenseTime, Hong Kong, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
[4] SenseTime Res, Hong Kong, Peoples R China
来源
关键词
Long-tailed recognition; Vision-language models; SMOTE;
D O I
10.1007/978-3-031-19806-9_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [21] Disentangling Label Distribution for Long-tailed Visual Recognition
    Hong, Youngkyu
    Han, Seungju
    Choi, Kwanghee
    Seo, Seokjun
    Kim, Beomsu
    Chang, Buru
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6622 - 6632
  • [22] A dual progressive strategy for long-tailed visual recognition
    Hong Liang
    Guoqing Cao
    Mingwen Shao
    Qian Zhang
    Machine Vision and Applications, 2024, 35
  • [23] A dual progressive strategy for long-tailed visual recognition
    Liang, Hong
    Cao, Guoqing
    Shao, Mingwen
    Zhang, Qian
    MACHINE VISION AND APPLICATIONS, 2024, 35 (01)
  • [24] Self Supervision to Distillation for Long-Tailed Visual Recognition
    Li, Tianhao
    Wang, Limin
    Wu, Gangshan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 610 - 619
  • [25] bt-vMF Contrastive and Collaborative Learning for Long-Tailed Visual Recognition
    Du, Jinhao
    Luo, Guibo
    Zhu, Yuesheng
    Bai, Zhiqiang
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 573 - 577
  • [26] NCL plus plus : Nested Collaborative Learning for long-tailed visual recognition
    Tan, Zichang
    Li, Jun
    Du, Jinhao
    Wan, Jun
    Lei, Zhen
    Guo, Guodong
    PATTERN RECOGNITION, 2024, 147
  • [27] Dynamic collaborative learning with heterogeneous knowledge transfer for long-tailed visual recognition
    Zhou, Hao
    Luo, Tingjin
    He, Yongming
    INFORMATION FUSION, 2025, 115
  • [28] Key Point Sensitive Loss for Long-Tailed Visual Recognition
    Li, Mengke
    Cheung, Yiu-Ming
    Hu, Zhikai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4812 - 4825
  • [29] Dynamic Learnable Logit Adjustment for Long-Tailed Visual Recognition
    Zhang, Enhao
    Geng, Chuanxing
    Li, Chaohua
    Chen, Songcan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7986 - 7997
  • [30] Feature Re-Balancing for Long-Tailed Visual Recognition
    Zhao, Yan
    Chen, Weicong
    Huang, Kai
    Zhu, Jihong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,