MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引:0
|
作者
Zhang, Shuai [1 ]
Xie, Minghong [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
来源
FRONTIERS IN PHYSICS | 2024年 / 12卷
关键词
RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;
D O I
10.3389/fphy.2024.1411559
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (09) : 2051 - 2065
  • [42] Cross-Modal Transformer for RGB-D semantic segmentation of production workshop objects
    Ru, Qingjun
    Chen, Guangzhu
    Zuo, Tingyu
    Liao, Xiaojuan
    PATTERN RECOGNITION, 2023, 144
  • [43] Eulerian Magnification of Multi-Modal RGB-D Video for Heart Rate Estimation
    Dosso, Yasmina Souley
    Bekele, Amente
    Green, James R.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (MEMEA), 2018, : 642 - 647
  • [44] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    AlDahoul, Nouar
    Karim, Hezerul Abdul
    Momo, Mhd Adel
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [45] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    Nouar AlDahoul
    Hezerul Abdul Karim
    Mhd Adel Momo
    Scientific Reports, 12
  • [46] MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification (vol 90, pg 436, 2019)
    Li, Yabei
    Zhang, Zhang
    Cheng, Yanhua
    Wang, Liang
    Tan, Tieniu
    PATTERN RECOGNITION, 2019, 94 : 250 - 250
  • [47] Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation
    Xu, Xinhua
    Liu, Jinfu
    Liu, Hong
    ELECTRONICS, 2023, 12 (18)
  • [48] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
    Sun, Peng
    Zhang, Wenhu
    Li, Songyuan
    Guo, Yilin
    Song, Congli
    Li, Xi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2822 - 2841
  • [49] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
    Peng Sun
    Wenhu Zhang
    Songyuan Li
    Yilin Guo
    Congli Song
    Xi Li
    International Journal of Computer Vision, 2022, 130 : 2822 - 2841
  • [50] 2.5D CONVOLUTION FOR RGB-D SEMANTIC SEGMENTATION
    Xing, Yajie
    Wang, Jingbo
    Chen, Xiaokang
    Zeng, Gang
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1410 - 1414