Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

被引:2
|
作者
Shu, Yangyang [1 ]
van den Hengel, Anton [1 ]
Liu, Lingqiao [1 ]
机构
[1] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia
关键词
D O I
10.1109/CVPR52729.2023.01096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning (SSL) strategies have demonstrated remarkable performance in various recognition tasks. However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. Intuitively, common rationales tend to correspond to the discriminative patterns from the key parts of foreground objects. We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective without using any pre-trained object parts or saliency detectors, making it seamlessly to be integrated with the existing SSL process. Specifically, we fit the GradCAM with a branch with limited fitting capacity, which allows the branch to capture the common rationales and discard the less common discriminative patterns. At the test stage, the branch generates a set of spatial weights to selectively aggregate features representing an instance. Extensive experimental results on four visual tasks demonstrate that the proposed method can lead to a significant improvement in different evaluation settings.(1)
引用
收藏
页码:11392 / 11401
页数:10
相关论文
共 50 条
  • [41] Self-Supervised Visual Representation Learning via Residual Momentum
    Pham, Trung Xuan
    Niu, Axi
    Zhang, Kang
    Jin, Tee Joshua Tian
    Hong, Ji Woo
    Yoo, Chang D.
    [J]. IEEE ACCESS, 2023, 11 : 116706 - 116720
  • [42] Dense Semantic Contrast for Self-Supervised Visual Representation Learning
    Li, Xiaoni
    Zhou, Yu
    Zhang, Yifei
    Zhang, Aoting
    Wang, Wei
    Jiang, Ning
    Wu, Haiying
    Wang, Weiping
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1368 - 1376
  • [43] JOINT LEARNING ON THE HIERARCHY REPRESENTATION FOR FINE-GRAINED HUMAN ACTION RECOGNITION
    Leong, Mei Chee
    Tan, Hui Li
    Zhang, Haosong
    Li, Liyuan
    Lin, Feng
    Lim, Joo Hwee
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1059 - 1063
  • [44] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
    Chen, Tianshui
    Wu, Wenxi
    Gao, Yuefang
    Dong, Le
    Luo, Xiaonan
    Lin, Liang
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031
  • [45] LEARNING DEEP AND SPARSE FEATURE REPRESENTATION FOR FINE-GRAINED OBJECT RECOGNITION
    Srinivas, M.
    Lin, Yen-Yu
    Liao, Hong-Yuan Mark
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1458 - 1463
  • [46] Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition
    Chen, Tianshui
    Lin, Liang
    Chen, Riquan
    Wu, Yang
    Luo, Xiaonan
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 627 - 634
  • [47] Self-supervised learning for visual tracking and recognition of human hand
    Wu, Y
    Huang, TS
    [J]. SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), 2000, : 243 - 248
  • [48] Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition
    Liu, Dichao
    Wang, Yu
    Kato, Jien
    [J]. VISAPP: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4, 2019, : 311 - 318
  • [49] Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning
    Tellamekala, Mani Kumar
    Valstar, Michel
    Pound, Michael
    Giesbrecht, Timo
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9912 - 9919
  • [50] Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning
    Sun, Jinghan
    Wei, Dong
    Ma, Kai
    Wang, Liansheng
    Zheng, Yefeng
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2307 - 2315