Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引:292
|
作者
He, Xin [1 ,2 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Zhang, Di [1 ,2 ]
Yao, Rui [1 ,2 ]
Xue, Yong [3 ,4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China
[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England
基金
中国国家自然科学基金;
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;
D O I
10.1109/TGRS.2022.3144165
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Remote Sensing Image Fusion Method Based on Improved Swin Transformer
    Li Zitong
    Zhao Jiankang
    Xu Jingran
    Long Haihui
    Liu Chuanqi
    ACTA PHOTONICA SINICA, 2023, 52 (11)
  • [32] Swin-RSIC: remote sensing image classification using a modified swin transformer with explainability
    Ansith S
    Ananth A
    Ebin Deni Raj
    Kala S
    Earth Science Informatics, 2025, 18 (2)
  • [33] Global Adaptive Second-Order Transformer for Remote Sensing Image Semantic Segmentation
    Zhang, Yijie
    Cheng, Jian
    Su, Yanzhou
    Deng, Changjian
    Xia, Ziying
    Tashi, Nyima
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [34] Ground-based image deconvolution with Swin Transformer UNet
    Akhaury, U.
    Jablonka, P.
    Starck, J.-L.
    Courbin, F.
    Astronomy and Astrophysics, 2024, 688
  • [35] Swin Transformer UNet for Very High Resolution Image Dehazing
    Bian, Yuxin
    Zhang, Enguang
    Wang, Jiayan
    Xie, Rixin
    Jiang, Shenlu
    SENSORS AND MATERIALS, 2022, 34 (11) : 4029 - 4037
  • [36] Ground-based image deconvolution with Swin Transformer UNet
    Akhaury, U.
    Jablonka, P.
    Starck, J. -l.
    Courbin, F.
    ASTRONOMY & ASTROPHYSICS, 2024, 688
  • [37] Global and edge enhanced transformer for semantic segmentation of remote sensing
    Wang, Hengyou
    Li, Xiao
    Huo, Lianzhi
    Hu, Changmiao
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5658 - 5673
  • [38] Unsupervised Domain Adaptation for Remote Sensing Semantic Segmentation with Transformer
    Li, Weitao
    Gao, Hui
    Su, Yi
    Momanyi, Biffon Manyura
    REMOTE SENSING, 2022, 14 (19)
  • [39] A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
    Ma, Xianping
    Zhang, Xiaokang
    Pun, Man-On
    Liu, Ming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [40] Remote Sensing Image Recognition Algorithm Based on Pseudo Global Swin Transformer
    Wang K.
    Zuo X.
    Yang Y.
    Fei S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (09): : 818 - 831