Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

被引:292
|
作者
He, Xin [1 ,2 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Zhang, Di [1 ,2 ]
Yao, Rui [1 ,2 ]
Xue, Yong [3 ,4 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ Peoples Republ China, Engn Res Ctr Mine Digitizat, Xuzhou 221116, Jiangsu, Peoples R China
[3] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
[4] Univ Derby, Sch Elect Comp & Math, Derby DE22 1GB, England
基金
中国国家自然科学基金;
关键词
Transformers; Semantics; Image segmentation; Feature extraction; Convolutional neural networks; Remote sensing; Task analysis; Global information embedding; remote sensing (RS); semantic segmentation; Swin transformer; CLASSIFICATION; RECOGNITION;
D O I
10.1109/TGRS.2022.3144165
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at <uri>https://github.com/XinnHe/ST-UNet</uri>.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
    Wu, Honglin
    Huang, Peng
    Zhang, Min
    Tang, Wenlong
    Yu, Xinyu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [42] CTFNet: CNN-Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
    Wu H.
    Huang P.
    Zhang M.
    Tang W.
    IEEE Geoscience and Remote Sensing Letters, 2024, 21 : 1 - 5
  • [43] DENSE SWIN-UNET: DENSE SWIN TRANSFORMERS FOR SEMANTIC SEGMENTATION OF PNEUMOTHORAX IN CT IMAGES
    Tang, Zhixian
    Zhang, Jinyang
    Bai, Chulin
    Zhang, Yan
    Liang, Kaiyi
    Yao, Xufeng
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2023, 23 (08)
  • [44] Semantic Segmentation Method of UAV Image Based on Window Attention Aggregation Swin Transformer
    Li, Junjie
    Yi, Shi
    He, Runhua
    Liu, Xi
    Computer Engineering and Applications, 2024, 60 (15) : 198 - 210
  • [45] DENSE SWIN-UNET: DENSE SWIN TRANSFORMERS FOR SEMANTIC SEGMENTATION OF PNEUMOTHORAX IN CT IMAGES
    Tang, Zhixian
    Zhang, Jinyang
    Bai, Chulin
    Zhang, Yan
    Liang, Kaiyi
    Yao, Xufeng
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2023,
  • [46] Indoor semantic segmentation based on Swin-Transformer
    Zheng, Yunping
    Xu, Yuan
    Shu, Shiqiang
    Sarem, Mudar
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [47] Improved Swin Transformer-Based Semantic Segmentation of Postearthquake Dense Buildings in Urban Areas Using Remote Sensing Images
    Cui, Liangyi
    Jing, Xin
    Wang, Yu
    Huan, Yixuan
    Xu, Yang
    Zhang, Qiangqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 369 - 385
  • [48] Axis-Based Transformer UNet for RGB Remote Sensing Image Denoising
    Zhu, Zhiliang
    Zhang, Siyi
    Qiu, Leiningxin
    Wang, Hui
    Luo, Guoliang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2515 - 2519
  • [49] MBT-UNet: Multi-Branch Transform Combined with UNet for Semantic Segmentation of Remote Sensing Images
    Liu, Bin
    Li, Bing
    Sreeram, Victor
    Li, Shuofeng
    REMOTE SENSING, 2024, 16 (15)
  • [50] Remote Sensing Image Detection and Segmentation Based on Word Embedding
    You H.-F.
    Tian S.-W.
    Yu L.
    Lü Y.-L.
    Tian, Sheng-Wei (tianshengwei@163.com), 1600, Chinese Institute of Electronics (48): : 75 - 83