Hybrid ViT-CNN Network for Fine-Grained Image Classification

被引:0
|
作者
Shao, Ran [1 ,2 ]
Bi, Xiao-Jun [3 ,4 ]
Chen, Zheng [3 ,4 ]
机构
[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150000, Peoples R China
[2] Harbin Vocat & Tech Coll, Coll Elect & Informat Engn, Harbin 150000, Peoples R China
[3] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Gov, Beijing 100081, Peoples R China
[4] Minzu Univ China, Dept Informat Engn, Beijing 100081, Peoples R China
关键词
Convolutional neural networks; fine-grained visual classification; multi-scale feature; vision transformer;
D O I
10.1109/LSP.2024.3386112
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, vision transformer (ViT) has achieved remarkable breakthroughs in fine-grained visual classification (FGVC) because of its self-attention mechanism that excels in extracting distinctive features from different pixels. However, pure ViT falls short in capturing the crucial multi-scale, local, and low-layer features that hold significance for FGVC. To compensate for these shortcomings, a new hybrid network called HVCNet is designed, which fuses the advantages of ViT and convolutional neural networks (CNN). The three modifications in the original ViT are: 1) using a multi-scale image-to-tokens (MIT) module instead of directly tokenizing the raw input image, thus enabling the network to capture the features at different scales; 2) substituting feed-forward network in ViT's encoder with mixed convolution feed-forward (MCF) module, which enhances the capability of the network in capturing the local and multi-scale features; 3) designing multi-layer feature selection (MFS) module to address the issue of deep-layer tokens in ViT to avoid ignoring the local and low-layer features. The experiment results indicate that the proposed method surpasses state-of-the-art methods on publicly datasets.
引用
收藏
页码:1109 / 1113
页数:5
相关论文
共 50 条
  • [1] Synthetic aperture radar image ship classification based on ViT-CNN hybrid network
    Shao, Ran
    Bi, Xiaojun
    [J]. Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2024, 45 (08): : 1616 - 1623
  • [2] Feature relocation network for fine-grained image classification
    Zhao, Peng
    Li, Yi
    Tang, Baowei
    Liu, Huiting
    Yao, Sheng
    [J]. NEURAL NETWORKS, 2023, 161 : 306 - 317
  • [3] Two-Stream Contextualized CNN for Fine-Grained Image Classification
    Liu, Jiang
    Gao, Chenqiang
    Meng, Deyu
    Zuo, Wangmeng
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4232 - 4233
  • [4] Subtler mixed attention network on fine-grained image classification
    Liu, Chao
    Huang, Lei
    Wei, Zhiqiang
    Zhang, Wenfeng
    [J]. APPLIED INTELLIGENCE, 2021, 51 (11) : 7903 - 7916
  • [5] A Data Augmentation Based ViT for Fine-Grained Visual Classification
    Yuan, Shuozhi
    Guo, Wenming
    Han, Fang
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 1 - 12
  • [6] Subtler mixed attention network on fine-grained image classification
    Chao Liu
    Lei Huang
    Zhiqiang Wei
    Wenfeng Zhang
    [J]. Applied Intelligence, 2021, 51 : 7903 - 7916
  • [7] ASP-CNN: aligning semantic parts for fine-grained image classification
    Ge, Hao
    Tu, Xiaoguang
    Xie, Mei
    Ma, Zheng
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [8] Fine-grained image classification method based on hybrid attention module
    Lu, Weixiang
    Yang, Ying
    Yang, Lei
    [J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [9] Efficient multi-granularity network for fine-grained image classification
    Jiabao Wang
    Yang Li
    Hang Li
    Xun Zhao
    Rui Zhang
    Zhuang Miao
    [J]. Journal of Real-Time Image Processing, 2022, 19 : 853 - 866
  • [10] Fine-Grained Image Classification Based on Cross-Attention Network
    Zheng, Zhiwen
    Zhou, Juxiang
    Gan, Jianhou
    Luo, Sen
    Gao, Wei
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)