MFFNet: Multi-Modal Feature Fusion Network for V-D-T Salient Object Detection

被引:12
|
作者
Wan, Bin [1 ]
Zhou, Xiaofei [1 ]
Sun, Yaoqi [2 ]
Wang, Tingyu [1 ]
Lv, Chengtao [1 ]
Wang, Shuai [3 ]
Yin, Haibing [4 ]
Yan, Chenggang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Inst, Sch Automat, Hangzhou 310018, Peoples R China
[3] Hangzhou Dianzi Univ, Lishui Inst, Sch Cyberspace, Hangzhou 310018, Peoples R China
[4] Hangzhou Dianzi Univ, Sch Commun Engn, Lishui Inst, Hangzhou 310018, Peoples R China
关键词
Multi-modal feature fusion network; V-D-T salient object detection; triple-modal deep fusion encoder; progressive feature enhancement decoder;
D O I
10.1109/TMM.2023.3291823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article discusses the limitations of single- and two-modal salient object detection (SOD) methods and the emergence of multi-modal SOD techniques that integrate Visible, Depth, or Thermal information. However, current multi-modal methods often rely on simple fusion techniques such as addition, multiplication and concatenation, to combine the different modalities, which is ineffective for challenging scenes, such as low illumination and background messy. To address this issue, we propose a novel multi-modal feature fusion network (MFFNet) for V-D-T salient object detection, where the two key points are the triple-modal deep fusion encoder and the progressive feature enhancement decoder. The MFFNet's triple-modal deep fusion (TDF) module is designed to integrate the features of the three modalities and explore their complementarity by utilizing mutual optimization during the encoding phase. In addition, the progressive feature enhancement decoder consists of the weighted context-enhanced feature (WCF) module, region optimization (RO) module and boundary perception (BP) module to produce region-aware and contour-aware features. After that, a multi-scale fusion (MF) module is proposed to integrate these features and generate high-quality saliency maps. We conduct extensive experiments on the VDT-2048 dataset, and our results show that the proposed MFFNet outperforms 12 state-of-the-art multi-modal methods.
引用
收藏
页码:2069 / 2081
页数:13
相关论文
共 50 条
  • [31] Multi-branch feature fusion and refinement network for salient object detection
    Yang, Jinyu
    Shi, Yanjiao
    Zhang, Jin
    Guo, Qianqian
    Zhang, Qing
    Cui, Liu
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [32] Dual-domain deformable feature fusion for multi-modal 3D object detection
    Wang, Shihao
    Deng, Tao
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [33] Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection
    Abraham S.E.
    Kovoor B.C.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (04) : 2341 - 2359
  • [34] Unsupervised RGB-T object tracking with attentional multi-modal feature fusion
    Shenglan Li
    Rui Yao
    Yong Zhou
    Hancheng Zhu
    Bing Liu
    Jiaqi Zhao
    Zhiwen Shao
    Multimedia Tools and Applications, 2023, 82 : 23595 - 23613
  • [35] Unsupervised RGB-T object tracking with attentional multi-modal feature fusion
    Li, Shenglan
    Yao, Rui
    Zhou, Yong
    Zhu, Hancheng
    Liu, Bing
    Zhao, Jiaqi
    Shao, Zhiwen
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (15) : 23595 - 23613
  • [36] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [37] Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network
    Liu, Xiaozheng
    Zhang, Yunzhou
    Cao, He
    Shan, Dexing
    Zhao, Jiaqi
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1751 - 1756
  • [38] MEANet: Multi-modal edge-aware network for light field salient object detection
    Jiang, Yao
    Zhang, Wenbo
    Fu, Keren
    Zhao, Qijun
    NEUROCOMPUTING, 2022, 491 : 78 - 90
  • [39] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Xiao, Yun
    Huang, Yameng
    Li, Chenglong
    Liu, Lei
    Zhou, Aiwu
    Tang, Jin
    COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883
  • [40] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Yun Xiao
    Yameng Huang
    Chenglong Li
    Lei Liu
    Aiwu Zhou
    Jin Tang
    Cognitive Computation, 2023, 15 : 1868 - 1883