Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

被引：0

作者：

Lee, Yejin ^{[1
]}

Lee, Suho ^{[1
]}

Hwang, Sangheum ^{[1
,2
,3
]}

机构：

[1] Seoul Natl Univ Sci & Technol, Dept Data Sci, Seoul 01811, South Korea

[2] Seoul Natl Univ Sci & Technol, Dept Ind & Informat Syst Engn, Seoul 01811, South Korea

[3] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Seoul 01811, South Korea

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 18期

基金：

新加坡国家研究基金会;

关键词：

self-supervised learning; fine-grained image recognition; transfer learning; Vision Transformer;

D O I：

10.3390/app131810493

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Fine-grained image recognition aims to classify fine subcategories belonging to the same parent category, such as vehicle model or bird species classification. This is an inherently challenging task because a classifier must capture subtle interclass differences under large intraclass variances. Most previous approaches are based on supervised learning, which requires a large-scale labeled dataset. However, such large-scale annotated datasets for fine-grained image recognition are difficult to collect because they generally require domain expertise during the labeling process. In this study, we propose a self-supervised transfer learning method based on Vision Transformer (ViT) to learn finer representations without human annotations. Interestingly, it is observed that existing self-supervised learning methods using ViT (e.g., DINO) show poor patch-level semantic consistency, which may be detrimental to learning finer representations. Motivated by this observation, we propose a consistency loss function that encourages patch embeddings of the overlapping area between two augmented views to be similar to each other during self-supervised learning on fine-grained datasets. In addition, we explore effective transfer learning strategies to fully leverage existing self-supervised models trained on large-scale labeled datasets. Contrary to the previous literature, our findings indicate that training only the last block of ViT is effective for self-supervised transfer learning. We demonstrate the effectiveness of our proposed approach through extensive experiments using six fine-grained image classification benchmark datasets, including FGVC Aircraft, CUB-200-2011, Food-101, Oxford 102 Flowers, Stanford Cars, and Stanford Dogs. Under the linear evaluation protocol, our method achieves an average accuracy of 78.5%, outperforming the existing transfer learning method, which yields 77.2%.

引用

页数：14

共 50 条

[1] Patch-level Representation Learning for Self-supervised Vision Transformers
Yun, Sukmin
Lee, Hankook
Kim, Jaehyung
Shin, Jinwoo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
[2] Convolutional Fine-Grained Classification With Self-Supervised Target Relation Regularization
Liu, Kangjun
Chen, Ke
Jia, Kui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5570 - 5584
[3] Fine-Grained Self-Supervised Learning with Jigsaw puzzles for medical image classification
Park W.
Ryu J.
Comput. Biol. Med., 2024,
[4] Patch-wise self-supervised visual representation learning: a fine-grained approach
Javidani, Ali
Sadeghi, Mohammad Amin
Araabi, Babak Nadjar
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (06)
[5] Convolutional Fine-Grained Classification with Self-Supervised Target Relation Regularization
Liu, Kangjun
Chen, Ke
Jia, Kui
IEEE Transactions on Image Processing, 2022, 31 : 5570 - 5584
[6] Siamese self-supervised learning for fine-grained visual classification
Ji, Ruyi
Li, Jiaying
Zhang, Libo
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
[7] An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing
Hu, Feiran
Zhang, Chenlin
Guo, Jiangliang
Wei, Shen
Zhao, Lin
Xu, Anqi
Gao, Lingyan
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17648 - 17657
[8] HCL: Hierarchical Consistency Learning for Webly Supervised Fine-Grained Recognition
Sun, Hongbo
He, Xiangteng
Peng, Yuxin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5108 - 5119
[9] Self-supervised facial expression recognition with fine-grained feature selection
An, Heng-Yu
Jia, Rui-Sheng
VISUAL COMPUTER, 2024, 40 (10): : 7001 - 7013
[10] Self-supervised learning of pseudo classes for generalized zero-shot fine-grained recognition
Chen Y.-H.
Yeh M.-C.
Multimedia Tools and Applications, 2025, 84 (10) : 7915 - 7930

← 1 2 3 4 5 →