Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image

被引:21
|
作者
Guo, Jingxia [1 ]
Jia, Nan [1 ]
Bai, Jinniu [1 ]
机构
[1] Baotou Med Coll, Baotou 014040, Inner Mongolia, Peoples R China
基金
中国国家自然科学基金;
关键词
NETWORK; FUSION; RECOGNITION;
D O I
10.1038/s41598-022-19831-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recently, the scenes in large high-resolution remote sensing (HRRS) datasets have been classified using convolutional neural network (CNN)-based methods. Such methods are well-suited for spatial feature extraction and can classify images with relatively high accuracy. However, CNNs do not adequately learn the long-distance dependencies between images and features in image processing, despite this being necessary for HRRS image processing as the semantic content of the scenes in these images is closely related to their spatial relationship. CNNs also have limitations in solving problems related to large intra-class differences and high inter-class similarity. To overcome these challenges, in this study we combine the channel-spatial attention (CSA) mechanism with the Vision Transformer method to propose an effective HRRS image scene classification framework using Channel-Spatial Attention Transformers (CSAT). The proposed model extracts the channel and spatial features of HRRS images using CSA and the Multi-head Self-Attention (MSA) mechanism in the transformer module. First, the HRRS image is mapped into a series of multiple planar 2D patch vectors after passing to the CSA. Second, the ordered vector is obtained via the linear transformation of each vector, and the position and learnable embedding vectors are added to the sequence vector to capture the inter-feature dependencies at a distance from the generated image. Next, we use MSA to extract image features and the residual network structure to complete the encoder construction to solve the gradient disappearance problem and avoid overfitting. Finally, a multi-layer perceptron is used to classify the scenes in the HRRS images. The CSAT network is evaluated using three public remote sensing scene image datasets: UC-Merced, AID, and NWPU-RESISC45. The experimental results show that the proposed CSAT network outperforms a selection of state-of-the-art methods in terms of scene classification.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image
    Jingxia Guo
    Nan Jia
    Jinniu Bai
    [J]. Scientific Reports, 12
  • [2] CSANet: a channel-spatial attention network for remote sensing image change detection
    Cai, Yuyang
    Liao, Shuhong
    He, Wenxuan
    Huang, Weiliang
    Yan, Jingwen
    Liu, Lei
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (19) : 5936 - 5959
  • [3] Residual Dense Network Based on Channel-Spatial Attention for the Scene Classification of a High-Resolution Remote Sensing Image
    Zhao, Xiaolei
    Zhang, Jing
    Tian, Jimiao
    Zhuo, Li
    Zhang, Jie
    [J]. REMOTE SENSING, 2020, 12 (11)
  • [4] Channel-spatial attention network for fewshot classification
    Zhang, Yan
    Fang, Min
    Wang, Nian
    [J]. PLOS ONE, 2019, 14 (12):
  • [5] A multi-scale multi-channel CNN introducing a channel-spatial attention mechanism hyperspectral remote sensing image classification method
    Zhao, Ru
    Zhang, Chaozhu
    Xue, Dan
    [J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2024, 57 (01)
  • [6] Architectural style classification based on CNN and channel-spatial attention
    Wang, Bo
    Zhang, Sulan
    Zhang, Jifu
    Cai, Zhenjiao
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (01) : 99 - 107
  • [7] Image super-resolution with dense-sampling residual channel-spatial attention networks for multi-temporal remote sensing image classification
    Zhu, Yue
    Geiss, Christian
    So, Emily
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2021, 104
  • [8] A SPATIAL-CHANNEL ATTENTION-BASED CONVOLUTIONAL NEURAL NETWORK FOR REMOTE SENSING IMAGE CLASSIFICATION
    Shuai, Yuanzhen
    Yuan, Qiao
    Zhao, Shanshan
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3628 - 3631
  • [9] Facial Expression Recognition Based on Fine-Tuned Channel-Spatial Attention Transformer
    Yao, Huang
    Yang, Xiaomeng
    Chen, Di
    Wang, Zhao
    Tian, Yuan
    [J]. SENSORS, 2023, 23 (15)
  • [10] Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval
    Wu, Dongqing
    Li, Huihui
    Hou, Yinxuan
    Xu, Cuili
    Cheng, Gong
    Guo, Lei
    Liu, Hang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62