VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引:15
|
作者
Tian, Changyao [1 ,4 ]
Wang, Wenhai [3 ]
Zhu, Xizhou [2 ]
Dai, Jifeng [2 ]
Qiao, Yu [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] SenseTime, Hong Kong, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
[4] SenseTime Res, Hong Kong, Peoples R China
来源
关键词
Long-tailed recognition; Vision-language models; SMOTE;
D O I
10.1007/978-3-031-19806-9_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [31] FCC: Feature Clusters Compression for Long-Tailed Visual Recognition
    Li, Jian
    Meng, Ziyao
    Shi, Daqian
    Song, Rui
    Diao, Xiaolei
    Wang, Jingwen
    Xu, Hao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24080 - 24089
  • [32] Feature calibration and feature separation for long-tailed visual recognition
    Wang, Qianqian
    Zhou, Fangyu
    Zhao, Xiangge
    Lin, Yangtao
    Ye, Haibo
    NEUROCOMPUTING, 2025, 637
  • [33] Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition
    Zhao, Yan
    Chen, Weicong
    Tan, Xu
    Huang, Kai
    Zhu, Jihong
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3472 - 3480
  • [34] Hierarchical block aggregation network for long-tailed visual recognition
    Pang, Shanmin
    Wang, Weiye
    Zhang, Renzhong
    Hao, Wenyu
    NEUROCOMPUTING, 2023, 549
  • [35] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
    Li, Shuang
    Gong, Kaixiong
    Liu, Chi Harold
    Wang, Yulin
    Qiao, Feng
    Cheng, Xinjing
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5208 - 5217
  • [36] Dynamic prior probability network for long-tailed visual recognition
    Zhou, Xuesong
    Sun, Jiaqi
    Zhai, Junhai
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
  • [37] Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
    Li, Mengke
    Cheung, Yiu-Ming
    Lu, Yang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6919 - 6928
  • [38] Long-tailed visual recognition with deep models: A methodological survey and evaluation
    Fu, Yu
    Xiang, Liuyu
    Zahid, Yumna
    Ding, Guiguang
    Mei, Tao
    Shen, Qiang
    Han, Jungong
    NEUROCOMPUTING, 2022, 509 : 290 - 309
  • [39] Contrastive dual-branch network for long-tailed visual recognition
    Miao, Jie
    Zhai, Junhai
    Han, Ling
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
  • [40] Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
    Li, Mengke
    Hu, Zhikai
    Lu, Yang
    Lan, Weichao
    Cheung, Yiu-ming
    Huang, Hui
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13581 - 13589