Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

被引:42
|
作者
Panboonyuen, Teerapong [1 ]
Jitkajornwanich, Kulsawasd [2 ]
Lawawirojwong, Siam [3 ]
Srestasathiern, Panu [3 ]
Vateekul, Peerapon [1 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Phayathai Rd, Bangkok 10330, Thailand
[2] King Mongkuts Inst Technol Ladkrabang, Dept Comp Sci, Data Sci & Computat Intelligence DSCI Lab, Chalongkrung Rd, Bangkok 10520, Thailand
[3] Geoinformat & Space Technol Dev Agcy Publ Org, 120 Govt Complex,Chaeng Wattana Rd, Bangkok 10210, Thailand
关键词
vision transformer; fully transformer networks; convolutional neural network; feature pyramid network; high-resolution representations; ISPRS Vaihingen; Landsat-8;
D O I
10.3390/rs13245100
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] MUSTER: A Multi-Scale Transformer-Based Decoder for Semantic Segmentation
    Xu, Jing
    Shi, Wentao
    Gao, Pan
    Li, Qizhu
    Wang, Zhengwei
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [2] A Transformer-based Semantic Segmentation Model for Street Fashion Images
    Peng, Dingjie
    Kameyama, Wataru
    [J]. INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2023, 2023, 12592
  • [3] A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining
    Shi, Bowen
    Jiang, Dongsheng
    Zhang, Xiaopeng
    Li, Han
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    Tian, Qi
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 624 - 639
  • [4] FULLY CONVOLUTIONAL AND FEEDFORWARD NETWORKS FOR THE SEMANTIC SEGMENTATION OF REMOTELY SENSED IMAGES
    Pastorino, Martina
    Moser, Gabriele
    Serpico, Sebastiano B.
    Zerubia, Josiane
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1876 - 1880
  • [5] TransRSS: Transformer-based Radar Semantic Segmentation
    Zou, Hao
    Xie, Zhen
    Ou, Jiarong
    Gao, Yutao
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 6965 - 6972
  • [6] Trans-VNet: Transformer-based tooth semantic segmentation in CBCT images
    Wang, Chen
    Yang, Jingyu
    Wu, Baoyu
    Liu, Ruijun
    Yu, Peng
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 97
  • [7] Duplex Restricted Network With Guided Upsampling for the Semantic Segmentation of Remotely Sensed Images
    Wang, Xiaoyu
    Liang, Longxue
    Yan, Haowen
    Wu, Xiaosuo
    Lu, Wanzhen
    Cai, Jiali
    [J]. IEEE ACCESS, 2021, 9 (09): : 42438 - 42448
  • [8] Transformer-Based Semantic Segmentation for Recycling Materials in Construction
    Wang, Xin
    Han, Wei
    Mo, Sicheng
    Cai, Ting
    Gong, Yijing
    Li, Yin
    Zhu, Zhenhua
    [J]. COMPUTING IN CIVIL ENGINEERING 2023-DATA, SENSING, AND ANALYTICS, 2024, : 25 - 33
  • [9] Wavelet-based texture segmentation of remotely sensed images
    Acharyya, M
    Kundu, MK
    [J]. 11TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2001, : 69 - 74
  • [10] Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
    Cam Nguyen
    Asad, Zuhayr
    Deng, Ruining
    Huo, Yuankai
    [J]. MEDICAL IMAGING 2022: IMAGE PROCESSING, 2022, 12032