Transformer-Based Fused Attention Combined with CNNs for Image Classification

被引:0
|
作者
Jielin Jiang
Hongxiang Xu
Xiaolong Xu
Yan Cui
Jintao Wu
机构
[1] Nanjing University of Information Science and Technology,School of Software
[2] Nanjing University of Information Science and Technology,Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET)
[3] Nanjing Normal University of Special Education,College of Mathematics and Information Science
来源
Neural Processing Letters | 2023年 / 55卷
关键词
Image classification; Swin transformer; Fusion attention; Residual convolution;
D O I
暂无
中图分类号
学科分类号
摘要
The receptive field of convolutional neural networks (CNNs) is focused on the local context, while the transformer receptive field is concerned with the global context. Transformers are the new backbone of computer vision due to their powerful ability to extract global features, which is supported by pre-training on extensive amounts of data. However, it is challenging to collect a large number of high-quality labeled images for the pre-training phase. Therefore, this paper proposes a classification network (CofaNet) that combines CNNs and transformer-based fused attention to address the limitations of transformers without pre-training, such as low accuracy. CofaNet introduces patch sequence dimension attention to capture the relationship among subsequences and incorporates it into self-attention to construct a new attention feature extraction layer. Then, a residual convolution block is used instead of multi-layer perception after the fusion attention layer to compensate for the limited feature extraction of the attention layer on small datasets. The experimental results on three benchmark datasets demonstrate that CofaNet achieves excellent classification accuracy when compared to some transformer-based networks without pre-traning.
引用
收藏
页码:11905 / 11919
页数:14
相关论文
共 50 条
  • [31] TNPC: Transformer-based network for cloud classification☆
    Zhou, Wei
    Zhao, Yiheng
    Xiao, Yi
    Min, Xuanlin
    Yi, Jun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 239
  • [32] Transformer-based Neural Network for Electrocardiogram Classification
    Computer Science Department, Faculty of Computers and Information, Suez University, Suez, Egypt
    [J]. Intl. J. Adv. Comput. Sci. Appl., 11 (357-363): : 357 - 363
  • [33] BertSRC: transformer-based semantic relation classification
    Yeawon Lee
    Jinseok Son
    Min Song
    [J]. BMC Medical Informatics and Decision Making, 22
  • [34] Double Attention Transformer for Hyperspectral Image Classification
    Tang, Ping
    Zhang, Meng
    Liu, Zhihui
    Song, Rong
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [35] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    [J]. IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [36] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    [J]. IEEE Access, 2020, 8 : 213437 - 213446
  • [37] Hierarchical Attention Transformer for Hyperspectral Image Classification
    Arshad, Tahir
    Zhang, Junping
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [38] Transformer-based attention network for stock movement prediction
    Zhang, Qiuyue
    Qin, Chao
    Zhang, Yunfeng
    Bao, Fangxun
    Zhang, Caiming
    Liu, Peide
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 202
  • [39] Transformer-based Extraction of Deep Image Models
    Battis, Verena
    Penner, Alexander
    [J]. 2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 320 - 336
  • [40] A Review of Transformer-Based Approaches for Image Captioning
    Ondeng, Oscar
    Ouma, Heywood
    Akuon, Peter
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (19):