Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

被引：116

作者：

Gao, Wei ^{[1
,2
]}

Liao, Guibiao ^{[1
,2
]}

Ma, Siwei ^{[3
]}

Li, Ge ^{[1
,2
]}

Liang, Yongsheng ^{[4
]}

Lin, Weisi ^{[5
]}

机构：

[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China

[3] Peking Univ, Inst Digital Media, Beijing 100871, Peoples R China

[4] Harbin Inst Technol, Sch Elect & Informat Engn, Shenzhen 518055, Peoples R China

[5] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 04期

关键词：

Dynamic cross-modal guided mechanism; RGB-D/RGB-T multi-modal data; information fusion; salient object detection; VISUAL-ATTENTION; COLOR-VISION; IMAGE; SEGMENTATION; MECHANISMS; MODEL;

D O I：

10.1109/TCSVT.2021.3082939

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can he easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RCB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multilevel cross-modal fused features to capture both local and global information of salient objects, and can further boost the multimodal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (similar to 92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multi-modal SOD benchmark.

引用

页码：2091 / 2106

页数：16

共 50 条

[41] Edge-guided feature fusion network for RGB-T salient object detection
Chen, Yuanlin
Sun, Zengbao
Yan, Cheng
Zhao, Ming
FRONTIERS IN NEUROROBOTICS, 2024, 18
[42] ECFFNet: Effective and Consistent Feature Fusion Network for RGB-T Salient Object Detection
Zhou, Wujie
Guo, Qinling
Lei, Jingsheng
Yu, Lu
Hwang, Jenq-Neng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1224 - 1235
[43] MULTI-MODALITY DIVERSITY FUSION NETWORK WITH SWINTRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Duan, Songsong
Xia, Chenxing
Gao, Xiuju
Ge, Bin
Zhang, Hanling
Li, Kuan-Ching
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1076 - 1080
[44] TSFNet: Two-Stage Fusion Network for RGB-T Salient Object Detection
Guo, Qinling
Zhou, Wujie
Lei, Jingsheng
Yu, Lu
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1655 - 1659
[45] Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection
Gao, Haorao
Su, Yiming
Wang, Fasheng
Li, Haojie
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)
[46] Discriminative feature fusion for RGB-D salient object detection
Chen, Zeyu
Zhu, Mingyu
Chen, Shuhan
Lu, Lu
Tang, Haonan
Hu, Xuelong
Ji, Chunfan
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
[47] Multi-modal adapter for RGB-T tracking
Wang, He
Xu, Tianyang
Tang, Zhangyong
Wu, Xiao-Jun
Kittler, Josef
INFORMATION FUSION, 2025, 118
[48] Multi-enhanced Adaptive Attention Network for RGB-T Salient Object Detection
Hao, Hao-Zhou
Cheng, Yao
Ji, Yi
Li, Ying
Liu, Chun-Ping
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[49] PSNet: Parallel symmetric network for RGB-T salient object detection
Bi, Hongbo
Wu, Ranwan
Liu, Ziqi
Zhang, Jiayuan
Zhang, Cong
Xiang, Tian-Zhu
Wang, Xiufang
NEUROCOMPUTING, 2022, 511 (410-425) : 410 - 425
[50] ICNet: Information Conversion Network for RGB-D Based Salient Object Detection
Li, Gongyang
Liu, Zhi
Ling, Haibin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 4873 - 4884

← 1 2 3 4 5 →