A Lightweight Dual-Branch Swin Transformer for Remote Sensing Scene Classification

被引:11
|
作者
Zheng, Fujian [1 ]
Lin, Shuai [2 ]
Zhou, Wei [3 ]
Huang, Hong [1 ]
机构
[1] Chongqing Univ, Key Lab Optoelect Technol & Syst, Educ Minist China, Chongqing 400044, Peoples R China
[2] Shandong Nonmet Mat Inst, Linyi 250031, Peoples R China
[3] Chongqing Univ Sci & Technol, Sch Intelligent Technol & Engn, Chongqing 401331, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing scene classification; convolutional neural networks (CNNs); transfer learning; vision transformer (ViT);
D O I
10.3390/rs15112865
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The main challenge of scene classification is to understand the semantic context information of high-resolution remote sensing images. Although vision transformer (ViT)-based methods have been explored to boost the long-range dependencies of high-resolution remote sensing images, the connectivity between neighboring windows is still limited. Meanwhile, ViT-based methods commonly contain a large number of parameters, resulting in a huge computational consumption. In this paper, a novel lightweight dual-branch swin transformer (LDBST) method for remote sensing scene classification is proposed, and the discriminative ability of scene features is increased through combining a ViT branch and convolutional neural network (CNN) branch. First, based on the hierarchical swin transformer model, LDBST divides the input features of each stage into two parts, which are then separately fed into the two branches. For the ViT branch, a dual multilayer perceptron structure with a depthwise convolutional layer, termed Conv-MLP, is integrated into the branch to boost the connections with neighboring windows. Then, a simple-structured CNN branch with maximum pooling preserves the strong features of the scene feature map. Specifically, the CNN branch lightens the LDBST, by avoiding complex multi-head attention and multilayer perceptron computations. To obtain better feature representation, LDBST was pretrained on the large-scale remote scene classification images of the MLRSN and RSD46-WHU datasets. These two pretrained weights were fine-tuned on target scene classification datasets. The experimental results showed that the proposed LDBST method was more effective than some other advanced remote sensing scene classification methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Remote Sensing Image Scene Classification Based on Global-Local Dual-Branch Structure Model
    Xu, Kejie
    Huang, Hong
    Deng, Peifang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [2] A Dual-Branch Model Integrating CNN and Swin Transformer for Efficient Apple Leaf Disease Classification
    Si, Haiping
    Li, Mingchun
    Li, Weixia
    Zhang, Guipei
    Wang, Ming
    Li, Feitao
    Li, Yanling
    [J]. AGRICULTURE-BASEL, 2024, 14 (01):
  • [3] Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification
    Huang, Xinyan
    Liu, Fang
    Cui, Yuanhao
    Chen, Puhua
    Li, Lingling
    Li, Pengfang
    [J]. REMOTE SENSING, 2023, 15 (14)
  • [4] Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
    Hao, Siyuan
    Li, Nan
    Ye, Yuanxin
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 6265 - 6278
  • [5] DBGA-Net: Dual-Branch Global-Local Attention Network for Remote Sensing Scene Classification
    Xia, Jingming
    Zhou, Yao
    Tan, Ling
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [6] MDTrans: Multi-scale and dual-branch feature fusion network based on Swin Transformer for building extraction in remote sensing images
    Diao, Kuo
    Zhu, Jinlong
    Liu, Guangjie
    Li, Meng
    [J]. IET IMAGE PROCESSING, 2024, 18 (11) : 2930 - 2942
  • [7] DBANet: Dual-branch Attention Network for hyperspectral remote sensing image classification
    Li, Zexu
    Chen, Gongchao
    Li, Guohou
    Zhou, Ling
    Pan, Xipeng
    Zhao, Wenyi
    Zhang, Weidong
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
  • [8] DSCAFormer: Lightweight Vision Transformer With Dual-Branch Spatial Channel Aggregation
    Li, Jinfeng
    Wu, Peng
    Xu, Renjie
    Zhang, Xiaoming
    Han, Zhi
    [J]. IEEE ACCESS, 2024, 12 : 75272 - 75288
  • [9] A Dual-Branch Fusion Network Based on Reconstructed Transformer for Building Extraction in Remote Sensing Imagery
    Wang, Yitong
    Wang, Shumin
    Dou, Aixia
    [J]. SENSORS, 2024, 24 (02)
  • [10] HELViT: highly efficient lightweight vision transformer for remote sensing image scene classification
    Guo, Dongen
    Wu, Zechen
    Feng, Jiangfan
    Zhou, Zhuoke
    Shen, Zhen
    [J]. APPLIED INTELLIGENCE, 2023, 53 (21) : 24947 - 24962