Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation

被引:16
|
作者
Yan, Li [1 ,2 ]
Huang, Jianming [1 ]
Xie, Hong [1 ]
Wei, Pengcheng [1 ]
Gao, Zhao [2 ]
机构
[1] Wuhan Univ, Sch Geodesy & Geomat, Wuhan 430079, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
关键词
semantic segmentation; self-attention; depth fusion; transformer; RESOLUTION; RGB;
D O I
10.3390/rs14051294
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Taking depth into consideration has been proven to improve the performance of semantic segmentation through providing additional geometry information. Most existing works adopt a two-stream network, extracting features from color images and depth images separately using two branches of the same structure, which suffer from high memory and computation costs. We find that depth features acquired by simple downsampling can also play a complementary part in the semantic segmentation task, sometimes even better than the two-stream scheme with the same two branches. In this paper, a novel and efficient depth fusion transformer network for aerial image segmentation is proposed. The presented network utilizes patch merging to downsample depth input and a depth-aware self-attention (DSA) module is designed to mitigate the gap caused by difference between two branches and two modalities. Concretely, the DSA fuses depth features and color features by computing depth similarity and impact on self-attention map calculated by color feature. Extensive experiments on the ISPRS 2D semantic segmentation dataset validate the efficiency and effectiveness of our method. With nearly half the parameters of traditional two-stream scheme, our method acquires 83.82% mIoU on Vaihingen dataset outperforming other state-of-the-art methods and 87.43% mIoU on Potsdam dataset comparable to the state-of-the-art.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation
    Chen, Xin
    Li, Dongfen
    Liu, Mingzhe
    Jia, Jiaru
    [J]. REMOTE SENSING, 2023, 15 (18)
  • [2] Pyramid Fusion Transformer for Semantic Segmentation
    Qin, Zipeng
    Liu, Jianbo
    Zhang, Xiaolin
    Tian, Maoqing
    Zhou, Aojun
    Yi, Shuai
    Li, Hongsheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9630 - 9643
  • [3] Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
    Chen, Yan
    Dong, Quan
    Wang, Xiaofeng
    Zhang, Qianchuan
    Kang, Menglei
    Jiang, Wenxiang
    Wang, Mengyuan
    Xu, Lixiang
    Zhang, Chen
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 4421 - 4435
  • [4] Few-Shot Aerial Image Semantic Segmentation Leveraging Pyramid Correlation Fusion
    Ao, Wei
    Zheng, Shunyi
    Meng, Yan
    Gao, Zhi
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 12
  • [5] A Unified Efficient Pyramid Transformer for Semantic Segmentation
    Zhu, Fangrui
    Zhu, Yi
    Zhang, Li
    Wu, Chongruo
    Fu, Yanwei
    Li, Mu
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2667 - 2677
  • [6] Efficient semantic segmentation with pyramidal fusion
    Orsic, Marin
    Segvic, Sinisa
    [J]. PATTERN RECOGNITION, 2021, 110
  • [7] A Depth Image Fusion Network for 3D Point Cloud Semantic Segmentation
    Wang, Zhou
    Jia, Zixi
    Lyu, Ao
    Wang, Yating
    Sun, Changsheng
    Liu, Yongxin
    [J]. 2019 9TH IEEE ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (IEEE-CYBER 2019), 2019, : 849 - 853
  • [8] DHT: Deformable Hybrid Transformer for Aerial Image Segmentation
    Zhang, Yan
    Gao, Xiyuan
    Duan, Qingyan
    Yuan, Lin
    Gao, Xinbo
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [9] CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
    Wu, Honglin
    Huang, Peng
    Zhang, Min
    Tang, Wenlong
    Yu, Xinyu
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [10] CTFNet: CNN-Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
    Wu, Honglin
    Huang, Peng
    Zhang, Min
    Tang, Wenlong
    [J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21 : 1 - 5