Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection

被引:250
|
作者
Chen, Hao [1 ]
Li, Youfu [1 ]
Su, Dan [1 ]
机构
[1] City Univ Hong Kong, Dept Mech Engn, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, Peoples R China
关键词
RGB-D; Convolutional neural networks; Multi-path; Saliency detection; DETECTION MODEL; VIDEO;
D O I
10.1016/j.patcog.2018.08.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Paired RGB and depth images are becoming popular multi-modal data adopted in computer vision tasks. Traditional methods based on Convolutional Neural Networks (CNNs) typically fuse RGB and depth by combining their deep representations in a late stage with only one path, which can be ambiguous and insufficient for fusing large amounts of cross-modal data. To address this issue, we propose a novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross-modal interactions in multiple layers. Compared to traditional two-stream architectures, the MMCI net is able to supply more adaptive and flexible fusion flows, thus easing the optimization and enabling sufficient and efficient fusion. Concurrently, the MMCI net is equipped with multi-scale perception ability (i.e., simultaneously global and local contextual reasoning). We take RGB-D saliency detection as an example task. Extensive experiments on three benchmark datasets show the improvement of the proposed MMCI net over other state-of-the-art methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:376 / 385
页数:10
相关论文
共 50 条
  • [31] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhao, Zhengyun
    Huang, Ziqing
    Chai, Xiuli
    Wang, Jun
    NEURAL PROCESSING LETTERS, 2023, 55 (01) : 361 - 384
  • [32] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhengyun Zhao
    Ziqing Huang
    Xiuli Chai
    Jun Wang
    Neural Processing Letters, 2023, 55 : 361 - 384
  • [33] RGB-D Salient Object Detection via Feature Fusion and Multi-scale Enhancement
    Wu, Peiliang
    Duan, Liangliang
    Kong, Lingfu
    COMPUTER VISION, CCCV 2015, PT II, 2015, 547 : 359 - 368
  • [34] Multi-scale Residual Interaction for RGB-D Salient Object Detection
    Hu, Mingjun
    Zhang, Xiaoqin
    Zhao, Li
    COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 575 - 590
  • [35] RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness
    Sun F.-M.
    Hu X.-H.
    Wu J.-Y.
    Sun J.
    Wang F.-S.
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (04): : 1899 - 1913
  • [36] MambaSOD: Dual Mamba-driven cross-modal fusion network for RGB-D Salient Object Detection
    Zhan, Yue
    Zeng, Zhihong
    Liu, Haijun
    Tan, Xiaoheng
    Tian, Yinli
    Neurocomputing, 2025, 631
  • [37] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    IEEE ACCESS, 2024, 12 : 45134 - 45146
  • [38] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    IEEE Access, 2024, 12 : 45134 - 45146
  • [39] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826
  • [40] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
    Chen, Hao
    Shen, Feihong
    Ding, Ding
    Deng, Yongjian
    Li, Chao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709