MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引:0
|
作者
Zhang, Shuai [1 ]
Xie, Minghong [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
来源
FRONTIERS IN PHYSICS | 2024年 / 12卷
关键词
RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;
D O I
10.3389/fphy.2024.1411559
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Multi-modal deep network for RGB-D segmentation of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (09) : 432 - 434
  • [2] Cross-modal attention fusion network for RGB-D semantic segmentation
    Zhao, Qiankun
    Wan, Yingcai
    Xu, Jiqian
    Fang, Lijin
    NEUROCOMPUTING, 2023, 548
  • [3] Intra-inter Modal Attention Blocks for RGB-D Semantic Segmentation
    Choi, Soyun
    Zhang, Youjia
    Hong, Sungeun
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 217 - 225
  • [4] A Multi-Modal RGB-D Object Recognizer
    Faeulhammer, Thomas
    Zillich, Michael
    Prankl, Johann
    Vincze, Markus
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 733 - 738
  • [5] MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
    Li, Yabei
    Zhang, Zhang
    Cheng, Yanhua
    Wang, Liang
    Tan, Tieniu
    PATTERN RECOGNITION, 2019, 90 : 436 - 449
  • [6] Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking
    Jiang, Ming-xin
    Deng, Chao
    Shan, Jing-song
    Wang, Yuan-yuan
    Jia, Yin-jie
    Sun, Xing
    INFORMATION FUSION, 2019, 50 : 1 - 8
  • [7] LinkNet: 2D-3D linked multi-modal network for online semantic segmentation of RGB-D videos
    Cai, Jun-Xiong
    Mu, Tai-Jiang
    Lai, Yu-Kun
    Hu, Shi-Min
    COMPUTERS & GRAPHICS-UK, 2021, 98 : 37 - 47
  • [8] RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    Computer Engineering and Applications, 2024, 60 (02) : 211 - 220
  • [9] DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation
    Yuan, Jianzhong
    Zhou, Wujie
    Luo, Ting
    IEEE ACCESS, 2019, 7 : 169350 - 169358
  • [10] computer catwalk: A multi-modal deep network for the segmentation of RGB-D images of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    Electronics Letters, 2020, 56 (09):