MFFNet: Multi-Modal Feature Fusion Network for V-D-T Salient Object Detection

被引:12
|
作者
Wan, Bin [1 ]
Zhou, Xiaofei [1 ]
Sun, Yaoqi [2 ]
Wang, Tingyu [1 ]
Lv, Chengtao [1 ]
Wang, Shuai [3 ]
Yin, Haibing [4 ]
Yan, Chenggang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Inst, Sch Automat, Hangzhou 310018, Peoples R China
[3] Hangzhou Dianzi Univ, Lishui Inst, Sch Cyberspace, Hangzhou 310018, Peoples R China
[4] Hangzhou Dianzi Univ, Sch Commun Engn, Lishui Inst, Hangzhou 310018, Peoples R China
关键词
Multi-modal feature fusion network; V-D-T salient object detection; triple-modal deep fusion encoder; progressive feature enhancement decoder;
D O I
10.1109/TMM.2023.3291823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article discusses the limitations of single- and two-modal salient object detection (SOD) methods and the emergence of multi-modal SOD techniques that integrate Visible, Depth, or Thermal information. However, current multi-modal methods often rely on simple fusion techniques such as addition, multiplication and concatenation, to combine the different modalities, which is ineffective for challenging scenes, such as low illumination and background messy. To address this issue, we propose a novel multi-modal feature fusion network (MFFNet) for V-D-T salient object detection, where the two key points are the triple-modal deep fusion encoder and the progressive feature enhancement decoder. The MFFNet's triple-modal deep fusion (TDF) module is designed to integrate the features of the three modalities and explore their complementarity by utilizing mutual optimization during the encoding phase. In addition, the progressive feature enhancement decoder consists of the weighted context-enhanced feature (WCF) module, region optimization (RO) module and boundary perception (BP) module to produce region-aware and contour-aware features. After that, a multi-scale fusion (MF) module is proposed to integrate these features and generate high-quality saliency maps. We conduct extensive experiments on the VDT-2048 dataset, and our results show that the proposed MFFNet outperforms 12 state-of-the-art multi-modal methods.
引用
收藏
页码:2069 / 2081
页数:13
相关论文
共 50 条
  • [21] Lightweight video salient object detection via channel-shuffle enhanced multi-modal fusion network
    Kan Huang
    Zhijing Xu
    Multimedia Tools and Applications, 2024, 83 : 1025 - 1039
  • [22] Modal complementary fusion network for RGB-T salient object detection
    Ma, Shuai
    Song, Kechen
    Dong, Hongwen
    Tian, Hongkun
    Yan, Yunhui
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9038 - 9055
  • [23] Modal complementary fusion network for RGB-T salient object detection
    Shuai Ma
    Kechen Song
    Hongwen Dong
    Hongkun Tian
    Yunhui Yan
    Applied Intelligence, 2023, 53 : 9038 - 9055
  • [24] Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection
    Song, Kechen
    Wang, Han
    Zhao, Ying
    Huang, Liming
    Dong, Hongwen
    Yan, Yunhui
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)
  • [25] Gated Multi-Modal Edge Refinement Network for Light Field Salient Object Detection
    Li, Yefan
    Duan, Fuqing
    Lu, Ke
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
  • [26] Feature extraction and fusion network for salient object detection
    Dai, Chao
    Pan, Chen
    He, Wei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33955 - 33969
  • [27] Feature extraction and fusion network for salient object detection
    Chao Dai
    Chen Pan
    Wei He
    Multimedia Tools and Applications, 2022, 81 : 33955 - 33969
  • [28] Hierarchical Feature Fusion Network for Salient Object Detection
    Li, Xuelong
    Song, Dawei
    Dong, Yongsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 9165 - 9175
  • [29] Selective feature fusion network for salient object detection
    Sun, Fengming
    Yuan, Xia
    Zhao, Chunxia
    IET COMPUTER VISION, 2023, 17 (04) : 483 - 495
  • [30] Multi-attention guided feature fusion network for salient object detection
    Li, Anni
    Qi, JinQing
    Lu, Huchuan
    NEUROCOMPUTING, 2020, 411 : 416 - 427