Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引:292
|
作者
He, Xin [1 ,2 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Zhang, Di [1 ,2 ]
Yao, Rui [1 ,2 ]
Xue, Yong [3 ,4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China
[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England
基金
中国国家自然科学基金;
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;
D O I
10.1109/TGRS.2022.3144165
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
    Fan, Lili
    Zhou, Yu
    Liu, Hongmei
    Li, Yunjie
    Cao, Dongpu
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
  • [2] Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation
    Zhou H.
    Xiao X.
    Li H.
    Liu X.
    Liang P.
    [J]. Neural Computing and Applications, 2024, 36 (25) : 15705 - 15720
  • [3] Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
    Zhou, Xuanyu
    Zhou, Lifan
    Gong, Shengrong
    Zhong, Shan
    Yan, Wei
    Huang, Yizhou
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 175 - 189
  • [4] STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation
    Gao, Liang
    Liu, Hui
    Yang, Minhang
    Chen, Long
    Wan, Yaling
    Xiao, Zhengqing
    Qian, Yurong
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 (14) : 10990 - 11003
  • [5] Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
    Mo, Youda
    Li, Huihui
    Xiao, Xiangling
    Zhao, Huimin
    Liu, Xiaoyong
    Zhan, Jin
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 5284 - 5296
  • [6] Remote sensing image semantic segmentation combining UNET and FPN
    Wang Xi
    Yu Ming
    Ren Hong-e
    [J]. CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2021, 36 (03) : 475 - 483
  • [7] Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery
    Meng, Xiaoliang
    Yang, Yuechi
    Wang, Libo
    Wang, Teng
    Li, Rui
    Zhang, Ce
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] Semantic Segmentation Method for Remote Sensing Images Based on Improved Swin Transformer
    Wang, Yizhong
    Hu, Yaqi
    Wu, Xiaosuo
    Yan, Haowen
    Wang, Xiaocheng
    [J]. Computer Engineering and Applications, 2024, 60 (11) : 194 - 203
  • [9] Remote sensing image semantic segmentation based on cascaded Transformer
    Wang F.
    Ji J.
    Wang Y.
    [J]. IEEE. Trans. Artif. Intell., 2024, 8 (4136-4148): : 1 - 12
  • [10] CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation
    Chen, Xin
    Li, Dongfen
    Liu, Mingzhe
    Jia, Jiaru
    [J]. REMOTE SENSING, 2023, 15 (18)