VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引：15

作者：

Tian, Changyao ^{[1
,4
]}

Wang, Wenhai ^{[3
]}

Zhu, Xizhou ^{[2
]}

Dai, Jifeng ^{[2
]}

Qiao, Yu ^{[3
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] SenseTime, Hong Kong, Peoples R China

[3] Shanghai AI Lab, Shanghai, Peoples R China

[4] SenseTime Res, Hong Kong, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXV | 2022年 / 13685卷

关键词：

Long-tailed recognition; Vision-language models; SMOTE;

D O I：

10.1007/978-3-031-19806-9_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.

引用

页码：73 / 91

页数：19

共 50 条

[21] Disentangling Label Distribution for Long-tailed Visual Recognition
Hong, Youngkyu
Han, Seungju
Choi, Kwanghee
Seo, Seokjun
Kim, Beomsu
Chang, Buru
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6622 - 6632
[22] A dual progressive strategy for long-tailed visual recognition
Hong Liang
Guoqing Cao
Mingwen Shao
Qian Zhang
Machine Vision and Applications, 2024, 35
[23] A dual progressive strategy for long-tailed visual recognition
Liang, Hong
Cao, Guoqing
Shao, Mingwen
Zhang, Qian
MACHINE VISION AND APPLICATIONS, 2024, 35 (01)
[24] Self Supervision to Distillation for Long-Tailed Visual Recognition
Li, Tianhao
Wang, Limin
Wu, Gangshan
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 610 - 619
[25] bt-vMF Contrastive and Collaborative Learning for Long-Tailed Visual Recognition
Du, Jinhao
Luo, Guibo
Zhu, Yuesheng
Bai, Zhiqiang
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 573 - 577
[26] NCL plus plus : Nested Collaborative Learning for long-tailed visual recognition
Tan, Zichang
Li, Jun
Du, Jinhao
Wan, Jun
Lei, Zhen
Guo, Guodong
PATTERN RECOGNITION, 2024, 147
[27] Dynamic collaborative learning with heterogeneous knowledge transfer for long-tailed visual recognition
Zhou, Hao
Luo, Tingjin
He, Yongming
INFORMATION FUSION, 2025, 115
[28] Key Point Sensitive Loss for Long-Tailed Visual Recognition
Li, Mengke
Cheung, Yiu-Ming
Hu, Zhikai
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4812 - 4825
[29] Dynamic Learnable Logit Adjustment for Long-Tailed Visual Recognition
Zhang, Enhao
Geng, Chuanxing
Li, Chaohua
Chen, Songcan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7986 - 7997
[30] Feature Re-Balancing for Long-Tailed Visual Recognition
Zhao, Yan
Chen, Weicong
Huang, Kai
Zhu, Jihong
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

← 1 2 3 4 5 →