MFFNet: Multi-Modal Feature Fusion Network for V-D-T Salient Object Detection

被引：12

作者：

Wan, Bin ^{[1
]}

Zhou, Xiaofei ^{[1
]}

Sun, Yaoqi ^{[2
]}

Wang, Tingyu ^{[1
]}

Lv, Chengtao ^{[1
]}

Wang, Shuai ^{[3
]}

Yin, Haibing ^{[4
]}

Yan, Chenggang ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China

[2] Hangzhou Dianzi Univ, Lishui Inst, Sch Automat, Hangzhou 310018, Peoples R China

[3] Hangzhou Dianzi Univ, Lishui Inst, Sch Cyberspace, Hangzhou 310018, Peoples R China

[4] Hangzhou Dianzi Univ, Sch Commun Engn, Lishui Inst, Hangzhou 310018, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Multi-modal feature fusion network; V-D-T salient object detection; triple-modal deep fusion encoder; progressive feature enhancement decoder;

D O I：

10.1109/TMM.2023.3291823

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article discusses the limitations of single- and two-modal salient object detection (SOD) methods and the emergence of multi-modal SOD techniques that integrate Visible, Depth, or Thermal information. However, current multi-modal methods often rely on simple fusion techniques such as addition, multiplication and concatenation, to combine the different modalities, which is ineffective for challenging scenes, such as low illumination and background messy. To address this issue, we propose a novel multi-modal feature fusion network (MFFNet) for V-D-T salient object detection, where the two key points are the triple-modal deep fusion encoder and the progressive feature enhancement decoder. The MFFNet's triple-modal deep fusion (TDF) module is designed to integrate the features of the three modalities and explore their complementarity by utilizing mutual optimization during the encoding phase. In addition, the progressive feature enhancement decoder consists of the weighted context-enhanced feature (WCF) module, region optimization (RO) module and boundary perception (BP) module to produce region-aware and contour-aware features. After that, a multi-scale fusion (MF) module is proposed to integrate these features and generate high-quality saliency maps. We conduct extensive experiments on the VDT-2048 dataset, and our results show that the proposed MFFNet outperforms 12 state-of-the-art multi-modal methods.

引用

页码：2069 / 2081

页数：13

共 50 条

[1] IFENet: Interaction, Fusion, and Enhancement Network for V-D-T Salient Object Detection
Bao, Liuxin
Zhou, Xiaofei
Zheng, Bolun
Cong, Runmin
Yin, Haibing
Zhang, Jiyong
Yan, Chenggang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 483 - 494
[2] TMNet: Triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection
Wan, Bin
Lv, Chengtao
Zhou, Xiaofei
Sun, Yaoqi
Zhu, Zunjie
Wang, Hongkui
Yan, Chenggang
PATTERN RECOGNITION, 2024, 147
[3] Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection
Bao, Liuxin
Zhou, Xiaofei
Lu, Xiankai
Sun, Yaoqi
Yin, Haibing
Hu, Zhenghui
Zhang, Jiyong
Yan, Chenggang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3212 - 3226
[4] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
Sun, Chenwang
Zhang, Qing
Zhuang, Chenyu
Zhang, Mingqian
IMAGE AND VISION COMPUTING, 2024, 147
[5] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
Guo, Kun
Gan, Tong
Ding, Zhao
Ling, Qiang
2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
[6] Multi-Modal Weights Sharing and Hierarchical Feature Fusion for RGBD Salient Object Detection
Xiao, Fen
Li, Bin
Peng, Yimu
Cao, Chunhong
Hu, Kai
Gao, Xieping
IEEE ACCESS, 2020, 8 : 26602 - 26611
[7] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[8] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
Hu, Nan
Ma, Huimin
Le, Chao
Shao, Xuehui
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355
[9] RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction
Gao, Yue
Dai, Meng
Zhang, Qing
Computer Engineering and Applications, 2024, 60 (02) : 211 - 220
[10] Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection
Wang, Kunpeng
Tu, Zhengzheng
Li, Chenglong
Zhang, Cheng
Luo, Bin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7344 - 7358

← 1 2 3 4 5 →