Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection

被引:250
|
作者
Chen, Hao [1 ]
Li, Youfu [1 ]
Su, Dan [1 ]
机构
[1] City Univ Hong Kong, Dept Mech Engn, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, Peoples R China
关键词
RGB-D; Convolutional neural networks; Multi-path; Saliency detection; DETECTION MODEL; VIDEO;
D O I
10.1016/j.patcog.2018.08.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Paired RGB and depth images are becoming popular multi-modal data adopted in computer vision tasks. Traditional methods based on Convolutional Neural Networks (CNNs) typically fuse RGB and depth by combining their deep representations in a late stage with only one path, which can be ambiguous and insufficient for fusing large amounts of cross-modal data. To address this issue, we propose a novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross-modal interactions in multiple layers. Compared to traditional two-stream architectures, the MMCI net is able to supply more adaptive and flexible fusion flows, thus easing the optimization and enabling sufficient and efficient fusion. Concurrently, the MMCI net is equipped with multi-scale perception ability (i.e., simultaneously global and local contextual reasoning). We take RGB-D saliency detection as an example task. Extensive experiments on three benchmark datasets show the improvement of the proposed MMCI net over other state-of-the-art methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:376 / 385
页数:10
相关论文
共 50 条
  • [21] Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection
    Wang, Shuaihui
    Jiang, Fengyi
    Xu, Boqian
    SENSORS, 2023, 23 (16)
  • [22] Dual attention guided multi-scale fusion network for RGB-D salient object detection
    Gao, Huan
    Guo, Jichang
    Wang, Yudong
    Dong, Jianan
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 118
  • [23] MAGNet: Multi-scale Awareness and Global fusion Network for RGB-D salient object detection
    Zhong, Mingyu
    Sun, Jing
    Ren, Peng
    Wang, Fasheng
    Sun, Fuming
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [24] Deep multi-scale and multi-modal fusion for 3D object detection
    Guo, Rui
    Li, Deng
    Han, Yahong
    PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242
  • [25] Multi-scale iterative refinement network for RGB-D salient object detection
    Liu, Ze-Yu
    Liu, Jian-Wei
    Zuo, Xin
    Hu, Ming-Fei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106
  • [26] Cross-Stage Multi-Scale Interaction Network for RGB-D Salient Object Detection
    Yi, Kang
    Zhu, Jinchao
    Guo, Fu
    Xu, Jing
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2402 - 2406
  • [27] A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection
    Liu, Zhengyi
    Zhang, Wei
    Zhao, Peng
    NEUROCOMPUTING, 2020, 387 : 210 - 220
  • [28] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
    Liang, Yanhua
    Qin, Guihe
    Sun, Minghui
    Qin, Jun
    Yan, Jie
    Zhang, Zhonghan
    NEUROCOMPUTING, 2022, 490 : 132 - 145
  • [29] Lightweight cross-modal transformer for RGB-D salient object detection
    Huang, Nianchang
    Yang, Yang
    Zhang, Qiang
    Han, Jungong
    Huang, Jin
    Computer Vision and Image Understanding, 2024, 249
  • [30] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313