VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引：15

作者：

Tian, Changyao ^{[1
,4
]}

Wang, Wenhai ^{[3
]}

Zhu, Xizhou ^{[2
]}

Dai, Jifeng ^{[2
]}

Qiao, Yu ^{[3
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] SenseTime, Hong Kong, Peoples R China

[3] Shanghai AI Lab, Shanghai, Peoples R China

[4] SenseTime Res, Hong Kong, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXV | 2022年 / 13685卷

关键词：

Long-tailed recognition; Vision-language models; SMOTE;

D O I：

10.1007/978-3-031-19806-9_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.

引用

页码：73 / 91

页数：19

共 50 条

[31] FCC: Feature Clusters Compression for Long-Tailed Visual Recognition
Li, Jian
Meng, Ziyao
Shi, Daqian
Song, Rui
Diao, Xiaolei
Wang, Jingwen
Xu, Hao
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24080 - 24089
[32] Feature calibration and feature separation for long-tailed visual recognition
Wang, Qianqian
Zhou, Fangyu
Zhao, Xiangge
Lin, Yangtao
Ye, Haibo
NEUROCOMPUTING, 2025, 637
[33] Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition
Zhao, Yan
Chen, Weicong
Tan, Xu
Huang, Kai
Zhu, Jihong
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3472 - 3480
[34] Hierarchical block aggregation network for long-tailed visual recognition
Pang, Shanmin
Wang, Weiye
Zhang, Renzhong
Hao, Wenyu
NEUROCOMPUTING, 2023, 549
[35] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
Li, Shuang
Gong, Kaixiong
Liu, Chi Harold
Wang, Yulin
Qiao, Feng
Cheng, Xinjing
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5208 - 5217
[36] Dynamic prior probability network for long-tailed visual recognition
Zhou, Xuesong
Sun, Jiaqi
Zhai, Junhai
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
[37] Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
Li, Mengke
Cheung, Yiu-Ming
Lu, Yang
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6919 - 6928
[38] Long-tailed visual recognition with deep models: A methodological survey and evaluation
Fu, Yu
Xiang, Liuyu
Zahid, Yumna
Ding, Guiguang
Mei, Tao
Shen, Qiang
Han, Jungong
NEUROCOMPUTING, 2022, 509 : 290 - 309
[39] Contrastive dual-branch network for long-tailed visual recognition
Miao, Jie
Zhai, Junhai
Han, Ling
PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
[40] Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Li, Mengke
Hu, Zhikai
Lu, Yang
Lan, Weichao
Cheung, Yiu-ming
Huang, Hui
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13581 - 13589

← 1 2 3 4 5 →