Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

被引:0
|
作者
Lee, Yejin [1 ]
Lee, Suho [1 ]
Hwang, Sangheum [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Data Sci, Seoul 01811, South Korea
[2] Seoul Natl Univ Sci & Technol, Dept Ind & Informat Syst Engn, Seoul 01811, South Korea
[3] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Seoul 01811, South Korea
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 18期
基金
新加坡国家研究基金会;
关键词
self-supervised learning; fine-grained image recognition; transfer learning; Vision Transformer;
D O I
10.3390/app131810493
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Fine-grained image recognition aims to classify fine subcategories belonging to the same parent category, such as vehicle model or bird species classification. This is an inherently challenging task because a classifier must capture subtle interclass differences under large intraclass variances. Most previous approaches are based on supervised learning, which requires a large-scale labeled dataset. However, such large-scale annotated datasets for fine-grained image recognition are difficult to collect because they generally require domain expertise during the labeling process. In this study, we propose a self-supervised transfer learning method based on Vision Transformer (ViT) to learn finer representations without human annotations. Interestingly, it is observed that existing self-supervised learning methods using ViT (e.g., DINO) show poor patch-level semantic consistency, which may be detrimental to learning finer representations. Motivated by this observation, we propose a consistency loss function that encourages patch embeddings of the overlapping area between two augmented views to be similar to each other during self-supervised learning on fine-grained datasets. In addition, we explore effective transfer learning strategies to fully leverage existing self-supervised models trained on large-scale labeled datasets. Contrary to the previous literature, our findings indicate that training only the last block of ViT is effective for self-supervised transfer learning. We demonstrate the effectiveness of our proposed approach through extensive experiments using six fine-grained image classification benchmark datasets, including FGVC Aircraft, CUB-200-2011, Food-101, Oxford 102 Flowers, Stanford Cars, and Stanford Dogs. Under the linear evaluation protocol, our method achieves an average accuracy of 78.5%, outperforming the existing transfer learning method, which yields 77.2%.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Incremental Learning for Fine-Grained Image Recognition
    Cao, Liangliang
    Hsiao, Jenhao
    de Juan, Paloma
    Li, Yuncheng
    Thomee, Bart
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 363 - 366
  • [22] A SELF-SUPERVISED FINE-GRAINED ABNORMALITIES RECOGNITION NEURAL NETWORK FOR ACTIVE PULMONARY TUBERCULOSIS DETECTION
    Huo, Yingyu
    Liu, Xin
    Zhong, Yong
    Long, Xianrong
    MEDICINE, 2024, 103 (37)
  • [23] Towards Fast and Accurate Image-Text Retrieval With Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1361 - 1372
  • [24] Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    arXiv, 2023,
  • [25] SELF-SUPERVISED CROSS-LEVEL CONSISTENCY LEARNING FOR FUNDUS IMAGE CLASSIFICATION
    Bi, Qi
    Zheng, Hao
    Sun, Xu
    Yi, Jingjun
    Zhang, Wentian
    Huang, Yawen
    Li, Yuexiang
    Zheng, Yefeng
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1781 - 1785
  • [26] A fine-grained image classification algorithm based on self-supervised learning and multi-feature fusion of blood cells
    Jia, Nan
    Guo, Jingxia
    Li, Yan
    Tang, Siyuan
    Xu, Li
    Liu, Liang
    Xing, Junfeng
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [27] A Self-Supervised Tree-Structured Framework for Fine-Grained Classification
    Cai, Qihang
    Niu, Lei
    Shang, Xibin
    Ding, Heng
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [28] Fine-Grained Object Classification via Self-Supervised Pose Alignment
    Yang, Xuhui
    Wang, Yaowei
    Chen, Ke
    Xu, Yong
    Tian, Yonghong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7389 - 7398
  • [29] Self-supervised learning for fine-grained monocular 3D face reconstruction in the wild
    Huang, Dongjin
    Shi, Yongsheng
    Liu, Jinhua
    Tang, Wen
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [30] Fine-grained pornographic image recognition with multiple feature fusion transfer learning
    Xinnan Lin
    Feiwei Qin
    Yong Peng
    Yanli Shao
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 73 - 86