MFFNet: Multi-Modal Feature Fusion Network for V-D-T Salient Object Detection

被引:12
|
作者
Wan, Bin [1 ]
Zhou, Xiaofei [1 ]
Sun, Yaoqi [2 ]
Wang, Tingyu [1 ]
Lv, Chengtao [1 ]
Wang, Shuai [3 ]
Yin, Haibing [4 ]
Yan, Chenggang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Inst, Sch Automat, Hangzhou 310018, Peoples R China
[3] Hangzhou Dianzi Univ, Lishui Inst, Sch Cyberspace, Hangzhou 310018, Peoples R China
[4] Hangzhou Dianzi Univ, Sch Commun Engn, Lishui Inst, Hangzhou 310018, Peoples R China
关键词
Multi-modal feature fusion network; V-D-T salient object detection; triple-modal deep fusion encoder; progressive feature enhancement decoder;
D O I
10.1109/TMM.2023.3291823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article discusses the limitations of single- and two-modal salient object detection (SOD) methods and the emergence of multi-modal SOD techniques that integrate Visible, Depth, or Thermal information. However, current multi-modal methods often rely on simple fusion techniques such as addition, multiplication and concatenation, to combine the different modalities, which is ineffective for challenging scenes, such as low illumination and background messy. To address this issue, we propose a novel multi-modal feature fusion network (MFFNet) for V-D-T salient object detection, where the two key points are the triple-modal deep fusion encoder and the progressive feature enhancement decoder. The MFFNet's triple-modal deep fusion (TDF) module is designed to integrate the features of the three modalities and explore their complementarity by utilizing mutual optimization during the encoding phase. In addition, the progressive feature enhancement decoder consists of the weighted context-enhanced feature (WCF) module, region optimization (RO) module and boundary perception (BP) module to produce region-aware and contour-aware features. After that, a multi-scale fusion (MF) module is proposed to integrate these features and generate high-quality saliency maps. We conduct extensive experiments on the VDT-2048 dataset, and our results show that the proposed MFFNet outperforms 12 state-of-the-art multi-modal methods.
引用
收藏
页码:2069 / 2081
页数:13
相关论文
共 50 条
  • [1] IFENet: Interaction, Fusion, and Enhancement Network for V-D-T Salient Object Detection
    Bao, Liuxin
    Zhou, Xiaofei
    Zheng, Bolun
    Cong, Runmin
    Yin, Haibing
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 483 - 494
  • [2] TMNet: Triple-modal interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection
    Wan, Bin
    Lv, Chengtao
    Zhou, Xiaofei
    Sun, Yaoqi
    Zhu, Zunjie
    Wang, Hongkui
    Yan, Chenggang
    PATTERN RECOGNITION, 2024, 147
  • [3] Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection
    Bao, Liuxin
    Zhou, Xiaofei
    Lu, Xiankai
    Sun, Yaoqi
    Yin, Haibing
    Hu, Zhenghui
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3212 - 3226
  • [4] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [5] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [6] Multi-Modal Weights Sharing and Hierarchical Feature Fusion for RGBD Salient Object Detection
    Xiao, Fen
    Li, Bin
    Peng, Yimu
    Cao, Chunhong
    Hu, Kai
    Gao, Xieping
    IEEE ACCESS, 2020, 8 : 26602 - 26611
  • [7] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [8] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
    Hu, Nan
    Ma, Huimin
    Le, Chao
    Shao, Xuehui
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355
  • [9] RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    Computer Engineering and Applications, 2024, 60 (02) : 211 - 220
  • [10] Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection
    Wang, Kunpeng
    Tu, Zhengzheng
    Li, Chenglong
    Zhang, Cheng
    Luo, Bin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7344 - 7358