Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

被引：2

作者：

Shu, Yangyang ^{[1
]}

van den Hengel, Anton ^{[1
]}

Liu, Lingqiao ^{[1
]}

机构：

[1] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01096

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-supervised learning (SSL) strategies have demonstrated remarkable performance in various recognition tasks. However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. Intuitively, common rationales tend to correspond to the discriminative patterns from the key parts of foreground objects. We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective without using any pre-trained object parts or saliency detectors, making it seamlessly to be integrated with the existing SSL process. Specifically, we fit the GradCAM with a branch with limited fitting capacity, which allows the branch to capture the common rationales and discard the less common discriminative patterns. At the test stage, the branch generates a set of spatial weights to selectively aggregate features representing an instance. Extensive experimental results on four visual tasks demonstrate that the proposed method can lead to a significant improvement in different evaluation settings.(1)

引用

页码：11392 / 11401

页数：10

共 50 条

[41] Self-Supervised Visual Representation Learning via Residual Momentum
Pham, Trung Xuan
Niu, Axi
Zhang, Kang
Jin, Tee Joshua Tian
Hong, Ji Woo
Yoo, Chang D.
[J]. IEEE ACCESS, 2023, 11 : 116706 - 116720
[42] Dense Semantic Contrast for Self-Supervised Visual Representation Learning
Li, Xiaoni
Zhou, Yu
Zhang, Yifei
Zhang, Aoting
Wang, Wei
Jiang, Ning
Wu, Haiying
Wang, Weiping
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1368 - 1376
[43] JOINT LEARNING ON THE HIERARCHY REPRESENTATION FOR FINE-GRAINED HUMAN ACTION RECOGNITION
Leong, Mei Chee
Tan, Hui Li
Zhang, Haosong
Li, Liyuan
Lin, Feng
Lim, Joo Hwee
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1059 - 1063
[44] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
Chen, Tianshui
Wu, Wenxi
Gao, Yuefang
Dong, Le
Luo, Xiaonan
Lin, Liang
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031
[45] LEARNING DEEP AND SPARSE FEATURE REPRESENTATION FOR FINE-GRAINED OBJECT RECOGNITION
Srinivas, M.
Lin, Yen-Yu
Liao, Hong-Yuan Mark
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1458 - 1463
[46] Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition
Chen, Tianshui
Lin, Liang
Chen, Riquan
Wu, Yang
Luo, Xiaonan
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 627 - 634
[47] Self-supervised learning for visual tracking and recognition of human hand
Wu, Y
Huang, TS
[J]. SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), 2000, : 243 - 248
[48] Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition
Liu, Dichao
Wang, Yu
Kato, Jien
[J]. VISAPP: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4, 2019, : 311 - 318
[49] Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning
Tellamekala, Mani Kumar
Valstar, Michel
Pound, Michael
Giesbrecht, Timo
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9912 - 9919
[50] Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning
Sun, Jinghan
Wei, Dong
Ma, Kai
Wang, Liansheng
Zheng, Yefeng
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2307 - 2315

← 1 2 3 4 5 →