Multi-Stage Training with Multi-Level Knowledge Self-Distillation for Fine-Grained Image Recognition

被引:0
|
作者
Yu Y. [1 ]
Wei W. [1 ]
Tang H. [1 ]
Qian J. [1 ]
机构
[1] School of Software, East China Jiaotong University, Nanchang
基金
中国国家自然科学基金;
关键词
feature learning; fine-grained image recognition; knowledge self-distillation; robust characteristics; Swin Transformer;
D O I
10.7544/issn1000-1239.202330262
中图分类号
学科分类号
摘要
Fine-grained image recognition is characterized by large intra-class variation and small inter-class variation, with wide applications in intelligent retail, biodiversity protection, and intelligent transportation. Extracting discriminative multi-granularity features is the key to improve the accuracy of fine-grained image recognition. Most of existing methods only perform knowledge acquisition at a single level, ignoring the effectiveness of multi-level information interaction for extracting robust features. The other work introduces attention mechanisms to locate discriminative local regions to extract discriminative features, but this inevitably increases the network complexity. To address these issues, a MKSMT (multi-level knowledge self-distillation with multi-step training) model for fine-grained image recognition is proposed. The model first learns features in the shallow network, then performs feature learning in the deep network, and uses self-distillation to transfer knowledge from the deep network to the shallow network. The optimized shallow network can help the deep network extract more robust features, thus improving the performance of the whole model. Experimental results show that MKSMT achieves classification accuracy of 92.8%, 92.6%, and 91.1% on three publicly available fine-grained image datasets, respectively, outperforming most state-of-the-art fine-grained recognition algorithms. © 2023 Science Press. All rights reserved.
引用
收藏
页码:1834 / 1845
页数:11
相关论文
共 42 条
  • [1] Xiushen Wei, Yizhe Song, Mac Aodha O, Et al., Fine-grained image analysis with deep learning: A survey[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 12, pp. 8927-8948, (2022)
  • [2] Tuia D, Kellenberger B, Beery S, Et al., Perspectives in machine learning for wildlife conservation, Nature Communications, 13, 1, (2022)
  • [3] Jiahang Yin, Ancong Wu, Weishi Zheng, Fine-grained person re-identification[J], International Journal of Computer Vision, 128, pp. 1654-1672, (2020)
  • [4] Yucheng Wei, Tran S, Shuxiang Xu, Et al., Deep learning for retail product recognition: Challenges and techniques[J], Computational Intelligence and Neuroscience, (2020)
  • [5] Ning Zhang, Donahue J, Girshick R, Et al., Part-based R-CNNs for fine-grained category detection[C], Proc of the 13th European Conf on Computer Vision (ECCV), pp. 834-849, (2014)
  • [6] Yifeng Ding, Ma Zhanyu, Wen Shaoguo, Et al., AP-CNN: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification[J], IEEE Transactions on Image Processing, 30, pp. 2826-2836, (2021)
  • [7] Vaswani A, Shazeer N, Parmar N, Et al., Attention is all you need[C], Proc of the 31st Advances in Neural Information Processing Systems, pp. 5998-6008, (2017)
  • [8] Dosovitskiy A, Beyer L, Kolesnikov A, Et al., An image is worth 16×16 words: Transformers for image recognition at scale [C], Proc of the 9th International Conference on Learning Representations, (2021)
  • [9] He Ju, Chen Jieneng, Liu Shuai, Et al., TransFg: A transformer architecture for fine-grained recognition[C], Proc of the AAAI Conf on Artificial Intelligence, 36, 1, pp. 852-860, (2022)
  • [10] Sun Hongbo, He Xiangteng, Peng Yuxin, SIM-Trans: Structure information modeling transformer for fine-grained visual categorization[C], Proc of the 30th ACM Int Conf on Multimedia, pp. 5853-5861, (2022)