Attention-based fusion network for RGB-D semantic segmentation

被引：0

作者：

Zhong, Li ^{[1
]}

Guo, Chi ^{[2
,3
]}

Zhan, Jiao ^{[2
]}

Deng, JingYi ^{[2
]}

机构：

[1] Wuhan Univ, Sch Geodesy & Geomat, Wuhan, Hubei, Peoples R China

[2] Wuhan Univ, Res Ctr GNSS, Wuhan 430072, Peoples R China

[3] Hubei Luojia Lab, Wuhan, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 608卷

关键词：

RGB-D semantic segmentation; Cross-modal fusion; Attention mechanism;

D O I：

10.1016/j.neucom.2024.128371

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D semantic segmentation can realize a profound comprehension of scenes, which is crucial in various computer vision tasks. However, due to the inherent modal variances and image noise, achieving superior segmentation using existing methods remains challenging. In this paper, we propose an attention-based fusion network for RGB-D semantic segmentation. Specifically, our network employs a forward multi-step propagation strategy and a backward progressive bootstrap fusion strategy based on the encoder-decoder architecture. By aggregating feature maps at different scales, we effectively diminish the uncertainty in the final prediction. Meanwhile, we introduce a Channel and Spatial Rectification Module (CSRM) to enable multi-dimensional interactions and noise removal. In order to achieve comprehensive integration between RGB and depth images, we put the rectified features into the Cross-Attention Fusion Module(CAFM). Extensive experiments show that our network can adeptly manage a diverse array of complex scenarios, demonstrating its innovative strength with superior performance and robust effectiveness across indoor NYU Depth V2 and SUN-RGBD datasets, and extending its capabilities to the outdoor Cityscapes dataset.

引用

页数：12

共 50 条

[21] Semantic Progressive Guidance Network for RGB-D Mirror Segmentation
Li, Chao
Zhou, Wujie
Zhou, Xi
Yan, Weiqing
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2780 - 2784
[22] Cascaded Feature Network for Semantic Segmentation of RGB-D Images
Lin, Di
Chen, Guangyong
Daniel Cohen-Or
Heng, Pheng-Ann
Huang, Hui
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1320 - 1328
[23] Cascaded Feature Network for Semantic Segmentation of RGB-D Images
Lin, Di
Chen, Guangyong
Cohen-Or, Daniel
Heng, Pheng-Ann
Huang, Hui
Proceedings of the IEEE International Conference on Computer Vision, 2017, 2017-October : 1320 - 1328
[24] Pixel Difference Convolutional Network for RGB-D Semantic Segmentation
Yang, Jun
Bai, Lizhi
Sun, Yaoru
Tian, Chunqi
Mao, Maoyu
Wang, Guorun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1481 - 1492
[25] TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
Weikuan Jia
Xingchao Yan
Qiaolian Liu
Ting Zhang
Xishang Dong
Complex & Intelligent Systems, 2024, 10 : 1219 - 1230
[26] TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
Jia, Weikuan
Yan, Xingchao
Liu, Qiaolian
Zhang, Ting
Dong, Xishang
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 1219 - 1230
[27] Multi-scale fusion for RGB-D indoor semantic segmentation
Jiang, Shiyi
Xu, Yang
Li, Danyang
Fan, Runze
SCIENTIFIC REPORTS, 2022, 12 (01):
[28] Triple fusion and feature pyramid decoder for RGB-D semantic segmentation
Ge, Bin
Zhu, Xu
Tang, Zihan
Xia, Chenxing
Lu, Yiming
Chen, Zhuang
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[29] Multi-scale fusion for RGB-D indoor semantic segmentation
Shiyi Jiang
Yang Xu
Danyang Li
Runze Fan
Scientific Reports, 12 (1)
[30] CFEINet: Cross-fusion and feature enhancement interaction network for RGB-D semantic segmentation
Ge, Bin
Lu, Yiming
Xia, Chenxing
Zhu, Xu
Zhang, Mengge
Gao, Mengya
Chen, Ningjie
DIGITAL SIGNAL PROCESSING, 2025, 160

← 1 2 3 4 5 →