MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引:0
|
作者
Zhang, Shuai [1 ]
Xie, Minghong [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
来源
FRONTIERS IN PHYSICS | 2024年 / 12卷
关键词
RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;
D O I
10.3389/fphy.2024.1411559
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Multi-scale fusion for RGB-D indoor semantic segmentation
    Jiang, Shiyi
    Xu, Yang
    Li, Danyang
    Fan, Runze
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [32] RAFNet: RGB-D attention feature fusion network for indoor semantic segmentation
    Yan, Xingchao
    Hou, Sujuan
    Karim, Awudu
    Jia, Weikuan
    DISPLAYS, 2021, 70
  • [33] Multi-scale fusion for RGB-D indoor semantic segmentation
    Shiyi Jiang
    Yang Xu
    Danyang Li
    Runze Fan
    Scientific Reports, 12 (1)
  • [34] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622
  • [35] Ingredient-guided multi-modal interaction and refinement network for RGB-D food nutrition assessment
    Nian, Fudong
    Hu, Yujie
    Gu, Yanhong
    Wu, Zhize
    Yang, Shimeng
    Shu, Jianhua
    DIGITAL SIGNAL PROCESSING, 2024, 153
  • [36] Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation
    Duan L.-J.
    Sun Q.-C.
    Qiao Y.-H.
    Chen J.-C.
    Cui G.-Q.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (02): : 275 - 291
  • [37] Joining geometric and RGB features for RGB-D semantic segmentation
    Zhang, Shaopeng
    Zhong, Min
    Zeng, Gang
    Gan, Rui
    2019 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2019, 11321
  • [38] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
    Liang, Yanhua
    Qin, Guihe
    Sun, Minghui
    Qin, Jun
    Yan, Jie
    Zhang, Zhonghan
    NEUROCOMPUTING, 2022, 490 : 132 - 145
  • [39] Semantic Segmentation of Defects in Infrastructures through Multi-modal Images
    Shahsavarani, Sara
    Lopez, Fernando
    Ibarra-Castanedo, Clemente
    Maldague, Xavier P., V
    THERMOSENSE: THERMAL INFRARED APPLICATIONS XLVI, 2024, 13047
  • [40] Multi-modal semantic image segmentation
    Pemasiri, Akila
    Kien Nguyen
    Sridharan, Sridha
    Fookes, Clinton
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 202