Attention-based fusion network for RGB-D semantic segmentation

被引：0

作者：

Zhong, Li ^{[1
]}

Guo, Chi ^{[2
,3
]}

Zhan, Jiao ^{[2
]}

Deng, JingYi ^{[2
]}

机构：

[1] Wuhan Univ, Sch Geodesy & Geomat, Wuhan, Hubei, Peoples R China

[2] Wuhan Univ, Res Ctr GNSS, Wuhan 430072, Peoples R China

[3] Hubei Luojia Lab, Wuhan, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 608卷

关键词：

RGB-D semantic segmentation; Cross-modal fusion; Attention mechanism;

D O I：

10.1016/j.neucom.2024.128371

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D semantic segmentation can realize a profound comprehension of scenes, which is crucial in various computer vision tasks. However, due to the inherent modal variances and image noise, achieving superior segmentation using existing methods remains challenging. In this paper, we propose an attention-based fusion network for RGB-D semantic segmentation. Specifically, our network employs a forward multi-step propagation strategy and a backward progressive bootstrap fusion strategy based on the encoder-decoder architecture. By aggregating feature maps at different scales, we effectively diminish the uncertainty in the final prediction. Meanwhile, we introduce a Channel and Spatial Rectification Module (CSRM) to enable multi-dimensional interactions and noise removal. In order to achieve comprehensive integration between RGB and depth images, we put the rectified features into the Cross-Attention Fusion Module(CAFM). Extensive experiments show that our network can adeptly manage a diverse array of complex scenarios, demonstrating its innovative strength with superior performance and robust effectiveness across indoor NYU Depth V2 and SUN-RGBD datasets, and extending its capabilities to the outdoor Cityscapes dataset.

引用

页数：12

共 50 条

[1] Attention-based three-branch network for RGB-D indoor semantic segmentation
Lei, Bo
Guo, Peiyan
Jia, Shaoyun
DIGITAL SIGNAL PROCESSING, 2025, 162
[2] Cross-modal attention fusion network for RGB-D semantic segmentation
Zhao, Qiankun
Wan, Yingcai
Xu, Jiqian
Fang, Lijin
NEUROCOMPUTING, 2023, 548
[3] RAFNet: RGB-D attention feature fusion network for indoor semantic segmentation
Yan, Xingchao
Hou, Sujuan
Karim, Awudu
Jia, Weikuan
DISPLAYS, 2021, 70
[4] A Fusion Network for Semantic Segmentation Using RGB-D Data
Yuan, Jiahui
Zhang, Kun
Xia, Yifan
Qi, Lin
Dong, Junyu
NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
[5] The Network of Attention-Aware Multimodal fusion for RGB-D Indoor Semantic Segmentation Method
Zhao, Qiankun
Wan, Yingcai
Fang, Lijin
Wang, Huaizhen
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 5093 - 5098
[6] DCANet: Differential convolution attention network for RGB-D semantic segmentation
Bai, Lizhi
Yang, Jun
Tian, Chunqi
Sun, Yaoru
Mao, Maoyu
Xu, Yanjun
Xu, Weirong
PATTERN RECOGNITION, 2025, 162
[7] CANet: Co-attention network for RGB-D semantic segmentation
Zhou, Hao
Qi, Lu
Huang, Hai
Yang, Xu
Wan, Zhaoliang
Wen, Xianglong
PATTERN RECOGNITION, 2022, 124
[8] CDMANet: central difference mutual attention network for RGB-D semantic segmentation
Ge, Mengjiao
Su, Wen
Gao, Jinfeng
Jia, Guoqiang
JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
[9] Transformer fusion for indoor RGB-D semantic segmentation
Wu, Zongwei
Zhou, Zhuyun
Allibert, Guillaume
Stolz, Christophe
Demonceaux, Cedric
Ma, Chao
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
[10] PSCNET: EFFICIENT RGB-D SEMANTIC SEGMENTATION PARALLEL NETWORK BASED ON SPATIAL AND CHANNEL ATTENTION
Du, S. Q.
Tang, S. J.
Wang, W. X.
Li, X. M.
Lu, Y. H.
Guo, R. Z.
XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION I, 2022, 5-1 : 129 - 136

← 1 2 3 4 5 →