Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

被引:0
|
作者
Lee, Yejin [1 ]
Lee, Suho [1 ]
Hwang, Sangheum [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Data Sci, Seoul 01811, South Korea
[2] Seoul Natl Univ Sci & Technol, Dept Ind & Informat Syst Engn, Seoul 01811, South Korea
[3] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Seoul 01811, South Korea
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 18期
基金
新加坡国家研究基金会;
关键词
self-supervised learning; fine-grained image recognition; transfer learning; Vision Transformer;
D O I
10.3390/app131810493
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Fine-grained image recognition aims to classify fine subcategories belonging to the same parent category, such as vehicle model or bird species classification. This is an inherently challenging task because a classifier must capture subtle interclass differences under large intraclass variances. Most previous approaches are based on supervised learning, which requires a large-scale labeled dataset. However, such large-scale annotated datasets for fine-grained image recognition are difficult to collect because they generally require domain expertise during the labeling process. In this study, we propose a self-supervised transfer learning method based on Vision Transformer (ViT) to learn finer representations without human annotations. Interestingly, it is observed that existing self-supervised learning methods using ViT (e.g., DINO) show poor patch-level semantic consistency, which may be detrimental to learning finer representations. Motivated by this observation, we propose a consistency loss function that encourages patch embeddings of the overlapping area between two augmented views to be similar to each other during self-supervised learning on fine-grained datasets. In addition, we explore effective transfer learning strategies to fully leverage existing self-supervised models trained on large-scale labeled datasets. Contrary to the previous literature, our findings indicate that training only the last block of ViT is effective for self-supervised transfer learning. We demonstrate the effectiveness of our proposed approach through extensive experiments using six fine-grained image classification benchmark datasets, including FGVC Aircraft, CUB-200-2011, Food-101, Oxford 102 Flowers, Stanford Cars, and Stanford Dogs. Under the linear evaluation protocol, our method achieves an average accuracy of 78.5%, outperforming the existing transfer learning method, which yields 77.2%.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Patch-level Representation Learning for Self-supervised Vision Transformers
    Yun, Sukmin
    Lee, Hankook
    Kim, Jaehyung
    Shin, Jinwoo
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
  • [2] Convolutional Fine-Grained Classification With Self-Supervised Target Relation Regularization
    Liu, Kangjun
    Chen, Ke
    Jia, Kui
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5570 - 5584
  • [3] Fine-Grained Self-Supervised Learning with Jigsaw puzzles for medical image classification
    Department of Software, Ajou University, Korea, Republic of
    不详
    [J]. Comput. Biol. Med., 2024,
  • [4] Convolutional Fine-Grained Classification with Self-Supervised Target Relation Regularization
    Liu, Kangjun
    Chen, Ke
    Jia, Kui
    [J]. IEEE Transactions on Image Processing, 2022, 31 : 5570 - 5584
  • [5] Siamese self-supervised learning for fine-grained visual classification
    Ji, Ruyi
    Li, Jiaying
    Zhang, Libo
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [6] HCL: Hierarchical Consistency Learning for Webly Supervised Fine-Grained Recognition
    Sun, Hongbo
    He, Xiangteng
    Peng, Yuxin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5108 - 5119
  • [7] Self-supervised facial expression recognition with fine-grained feature selection
    An, Heng-Yu
    Jia, Rui-Sheng
    [J]. VISUAL COMPUTER, 2024, 40 (10): : 7001 - 7013
  • [8] Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
    Shu, Yangyang
    van den Hengel, Anton
    Liu, Lingqiao
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11392 - 11401
  • [9] Self-Supervised Dense Consistency Regularization for Image-to-Image Translation
    Ko, Minsu
    Cha, Eunju
    Suh, Sungjoo
    Lee, Huijin
    Han, Jae-Joon
    Shin, Jinwoo
    Han, Bohyung
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18280 - 18289
  • [10] Webly Supervised Fine-Grained Image Recognition with Graph Representation and Metric Learning
    Lin, Jianman
    Lin, Jiantao
    Gao, Yuefang
    Yang, Zhijing
    Chen, Tianshui
    [J]. ELECTRONICS, 2022, 11 (24)