Attention-based fusion network for RGB-D semantic segmentation

被引:0
|
作者
Zhong, Li [1 ]
Guo, Chi [2 ,3 ]
Zhan, Jiao [2 ]
Deng, JingYi [2 ]
机构
[1] Wuhan Univ, Sch Geodesy & Geomat, Wuhan, Hubei, Peoples R China
[2] Wuhan Univ, Res Ctr GNSS, Wuhan 430072, Peoples R China
[3] Hubei Luojia Lab, Wuhan, Peoples R China
关键词
RGB-D semantic segmentation; Cross-modal fusion; Attention mechanism;
D O I
10.1016/j.neucom.2024.128371
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-D semantic segmentation can realize a profound comprehension of scenes, which is crucial in various computer vision tasks. However, due to the inherent modal variances and image noise, achieving superior segmentation using existing methods remains challenging. In this paper, we propose an attention-based fusion network for RGB-D semantic segmentation. Specifically, our network employs a forward multi-step propagation strategy and a backward progressive bootstrap fusion strategy based on the encoder-decoder architecture. By aggregating feature maps at different scales, we effectively diminish the uncertainty in the final prediction. Meanwhile, we introduce a Channel and Spatial Rectification Module (CSRM) to enable multi-dimensional interactions and noise removal. In order to achieve comprehensive integration between RGB and depth images, we put the rectified features into the Cross-Attention Fusion Module(CAFM). Extensive experiments show that our network can adeptly manage a diverse array of complex scenarios, demonstrating its innovative strength with superior performance and robust effectiveness across indoor NYU Depth V2 and SUN-RGBD datasets, and extending its capabilities to the outdoor Cityscapes dataset.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Semantic Progressive Guidance Network for RGB-D Mirror Segmentation
    Li, Chao
    Zhou, Wujie
    Zhou, Xi
    Yan, Weiqing
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2780 - 2784
  • [22] Cascaded Feature Network for Semantic Segmentation of RGB-D Images
    Lin, Di
    Chen, Guangyong
    Daniel Cohen-Or
    Heng, Pheng-Ann
    Huang, Hui
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1320 - 1328
  • [23] Cascaded Feature Network for Semantic Segmentation of RGB-D Images
    Lin, Di
    Chen, Guangyong
    Cohen-Or, Daniel
    Heng, Pheng-Ann
    Huang, Hui
    Proceedings of the IEEE International Conference on Computer Vision, 2017, 2017-October : 1320 - 1328
  • [24] Pixel Difference Convolutional Network for RGB-D Semantic Segmentation
    Yang, Jun
    Bai, Lizhi
    Sun, Yaoru
    Tian, Chunqi
    Mao, Maoyu
    Wang, Guorun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1481 - 1492
  • [25] TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
    Weikuan Jia
    Xingchao Yan
    Qiaolian Liu
    Ting Zhang
    Xishang Dong
    Complex & Intelligent Systems, 2024, 10 : 1219 - 1230
  • [26] TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
    Jia, Weikuan
    Yan, Xingchao
    Liu, Qiaolian
    Zhang, Ting
    Dong, Xishang
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 1219 - 1230
  • [27] Multi-scale fusion for RGB-D indoor semantic segmentation
    Jiang, Shiyi
    Xu, Yang
    Li, Danyang
    Fan, Runze
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [28] Triple fusion and feature pyramid decoder for RGB-D semantic segmentation
    Ge, Bin
    Zhu, Xu
    Tang, Zihan
    Xia, Chenxing
    Lu, Yiming
    Chen, Zhuang
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [29] Multi-scale fusion for RGB-D indoor semantic segmentation
    Shiyi Jiang
    Yang Xu
    Danyang Li
    Runze Fan
    Scientific Reports, 12 (1)
  • [30] CFEINet: Cross-fusion and feature enhancement interaction network for RGB-D semantic segmentation
    Ge, Bin
    Lu, Yiming
    Xia, Chenxing
    Zhu, Xu
    Zhang, Mengge
    Gao, Mengya
    Chen, Ningjie
    DIGITAL SIGNAL PROCESSING, 2025, 160