A Lightweight Dual-Branch Swin Transformer for Remote Sensing Scene Classification

被引:11
|
作者
Zheng, Fujian [1 ]
Lin, Shuai [2 ]
Zhou, Wei [3 ]
Huang, Hong [1 ]
机构
[1] Chongqing Univ, Key Lab Optoelect Technol & Syst, Educ Minist China, Chongqing 400044, Peoples R China
[2] Shandong Nonmet Mat Inst, Linyi 250031, Peoples R China
[3] Chongqing Univ Sci & Technol, Sch Intelligent Technol & Engn, Chongqing 401331, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing scene classification; convolutional neural networks (CNNs); transfer learning; vision transformer (ViT);
D O I
10.3390/rs15112865
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The main challenge of scene classification is to understand the semantic context information of high-resolution remote sensing images. Although vision transformer (ViT)-based methods have been explored to boost the long-range dependencies of high-resolution remote sensing images, the connectivity between neighboring windows is still limited. Meanwhile, ViT-based methods commonly contain a large number of parameters, resulting in a huge computational consumption. In this paper, a novel lightweight dual-branch swin transformer (LDBST) method for remote sensing scene classification is proposed, and the discriminative ability of scene features is increased through combining a ViT branch and convolutional neural network (CNN) branch. First, based on the hierarchical swin transformer model, LDBST divides the input features of each stage into two parts, which are then separately fed into the two branches. For the ViT branch, a dual multilayer perceptron structure with a depthwise convolutional layer, termed Conv-MLP, is integrated into the branch to boost the connections with neighboring windows. Then, a simple-structured CNN branch with maximum pooling preserves the strong features of the scene feature map. Specifically, the CNN branch lightens the LDBST, by avoiding complex multi-head attention and multilayer perceptron computations. To obtain better feature representation, LDBST was pretrained on the large-scale remote scene classification images of the MLRSN and RSD46-WHU datasets. These two pretrained weights were fine-tuned on target scene classification datasets. The experimental results showed that the proposed LDBST method was more effective than some other advanced remote sensing scene classification methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Lightweight dual-branch network for vehicle exhausts segmentation
    Chiyun Sheng
    Bin Hu
    Fanjun Meng
    Dong Yin
    [J]. Multimedia Tools and Applications, 2021, 80 : 17785 - 17806
  • [32] Best Representation Branch Model for Remote Sensing Image Scene Classification
    Zhang, Xinqi
    An, Weining
    Sun, Jinggong
    Wu, Hang
    Zhang, Wenchang
    Du, Yaohua
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 9768 - 9780
  • [33] Branch Feature Fusion Convolution Network for Remote Sensing Scene Classification
    Shi, Cuiping
    Wang, Tao
    Wang, Liguo
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 : 5194 - 5210
  • [34] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
    Bi, Meiqiao
    Wang, Minghua
    Li, Zhi
    Hong, Danfeng
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
  • [35] Remote Sensing Scene Classification Based on Local Selection Vision Transformer
    Yang Kai
    Lu Xiaoqiang
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (22)
  • [36] REVIEW OF VISION TRANSFORMER MODELS FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Zhang, Liangpei
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2231 - 2234
  • [37] Remote Sensing Scene Classification Using Spatial Transformer Fusion Network
    Tong, Shun
    Qi, Kunlun
    Guan, Qingfeng
    Zhu, Qiqi
    Yang, Chao
    Zheng, Jie
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 549 - 552
  • [38] RESFORMER: BRIDGING RESIDUAL NETWORK AND TRANSFORMER FOR REMOTE SENSING SCENE CLASSIFICATION
    Li, Mingteng
    Ma, Jingjing
    Tang, Xu
    Han, Xiao
    Zhu, Cheng
    Jiao, Licheng
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3147 - 3150
  • [39] A CNN-TRANSFORMER KNOWLEDGE DISTILLATION FOR REMOTE SENSING SCENE CLASSIFICATION
    Nabi, Mostaan
    Maggiolo, Luca
    Moser, Gabriele
    Serpico, Sebastiano B.
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 663 - 666
  • [40] DDT: Dual-branch Deformable Transformer for Image Denoising
    Liu, Kangliang
    Du, Xiangcheng
    Liu, Sijie
    Zheng, Yingbin
    Wu, Xingjiao
    Jin, Cheng
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2765 - 2770