Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection

被引：250

作者：

Chen, Hao ^{[1
]}

Li, Youfu ^{[1
]}

Su, Dan ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Mech Engn, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 86卷

关键词：

RGB-D; Convolutional neural networks; Multi-path; Saliency detection; DETECTION MODEL; VIDEO;

D O I：

10.1016/j.patcog.2018.08.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Paired RGB and depth images are becoming popular multi-modal data adopted in computer vision tasks. Traditional methods based on Convolutional Neural Networks (CNNs) typically fuse RGB and depth by combining their deep representations in a late stage with only one path, which can be ambiguous and insufficient for fusing large amounts of cross-modal data. To address this issue, we propose a novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross-modal interactions in multiple layers. Compared to traditional two-stream architectures, the MMCI net is able to supply more adaptive and flexible fusion flows, thus easing the optimization and enabling sufficient and efficient fusion. Concurrently, the MMCI net is equipped with multi-scale perception ability (i.e., simultaneously global and local contextual reasoning). We take RGB-D saliency detection as an example task. Extensive experiments on three benchmark datasets show the improvement of the proposed MMCI net over other state-of-the-art methods. (C) 2018 Elsevier Ltd. All rights reserved.

引用

页码：376 / 385

页数：10

共 50 条

[1] M3Net: Multi-scale Multi-path Multi-modal Fusion Network and Example Application to RGB-D Salient Object Detection
Chen, Hao
Li, You-Fu
Su, Dan
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4911 - 4916
[2] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
Wu, Jiajia
Han, Guangliang
Wang, Haining
Yang, Hang
Li, Qingqing
Liu, Dongxu
Ye, Fangjian
Liu, Peixun
IEEE ACCESS, 2021, 9 : 150608 - 150622
[3] Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
Xiao, Zhibin
Xie, Pengwei
Wang, Guijin
MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 352 - 363
[4] Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection
Wan, Xin
Yang, Gang
Zhou, Boyi
Liu, Chang
Wang, Hangxu
Wang, Yutao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 409 - 420
[5] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
Sun, Chenwang
Zhang, Qing
Zhuang, Chenyu
Zhang, Mingqian
IMAGE AND VISION COMPUTING, 2024, 147
[6] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Song, Peipei
Zhang, Jing
Koniusz, Piotr
Barnes, Nick
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
[7] M 2RNet: Multi-modal and multi-scale refined network for RGB-D salient object detection
Fang, Xian
Jiang, Mingfeng
Zhu, Jinchao
Shao, Xiuli
Wang, Hongpeng
PATTERN RECOGNITION, 2023, 135
[8] Multi-level cross-modal interaction network for RGB-D salient object detection
Huang, Zhou
Chen, Huai-Xin
Zhou, Tao
Yang, Yun-Zhi
Liu, Bi-Yuan
NEUROCOMPUTING, 2021, 452 : 200 - 211
[9] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[10] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
Gao, Yue
Dai, Meng
Zhang, Qing
VISUAL COMPUTER, 2023, 39 (09): : 3979 - 3994

← 1 2 3 4 5 →