MFFNet: Multi-Modal Feature Fusion Network for V-D-T Salient Object Detection

被引:12
|
作者
Wan, Bin [1 ]
Zhou, Xiaofei [1 ]
Sun, Yaoqi [2 ]
Wang, Tingyu [1 ]
Lv, Chengtao [1 ]
Wang, Shuai [3 ]
Yin, Haibing [4 ]
Yan, Chenggang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ, Lishui Inst, Sch Automat, Hangzhou 310018, Peoples R China
[3] Hangzhou Dianzi Univ, Lishui Inst, Sch Cyberspace, Hangzhou 310018, Peoples R China
[4] Hangzhou Dianzi Univ, Sch Commun Engn, Lishui Inst, Hangzhou 310018, Peoples R China
关键词
Multi-modal feature fusion network; V-D-T salient object detection; triple-modal deep fusion encoder; progressive feature enhancement decoder;
D O I
10.1109/TMM.2023.3291823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article discusses the limitations of single- and two-modal salient object detection (SOD) methods and the emergence of multi-modal SOD techniques that integrate Visible, Depth, or Thermal information. However, current multi-modal methods often rely on simple fusion techniques such as addition, multiplication and concatenation, to combine the different modalities, which is ineffective for challenging scenes, such as low illumination and background messy. To address this issue, we propose a novel multi-modal feature fusion network (MFFNet) for V-D-T salient object detection, where the two key points are the triple-modal deep fusion encoder and the progressive feature enhancement decoder. The MFFNet's triple-modal deep fusion (TDF) module is designed to integrate the features of the three modalities and explore their complementarity by utilizing mutual optimization during the encoding phase. In addition, the progressive feature enhancement decoder consists of the weighted context-enhanced feature (WCF) module, region optimization (RO) module and boundary perception (BP) module to produce region-aware and contour-aware features. After that, a multi-scale fusion (MF) module is proposed to integrate these features and generate high-quality saliency maps. We conduct extensive experiments on the VDT-2048 dataset, and our results show that the proposed MFFNet outperforms 12 state-of-the-art multi-modal methods.
引用
收藏
页码:2069 / 2081
页数:13
相关论文
共 50 条
  • [41] ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Ngo, Chong-Wah
    Mei, Tao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18021 - 18030
  • [42] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    VISUAL COMPUTER, 2023, 39 (09): : 3979 - 3994
  • [43] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
    Yue Gao
    Meng Dai
    Qing Zhang
    The Visual Computer, 2023, 39 : 3979 - 3994
  • [44] M3Net: Multi-scale Multi-path Multi-modal Fusion Network and Example Application to RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4911 - 4916
  • [45] M 2RNet: Multi-modal and multi-scale refined network for RGB-D salient object detection
    Fang, Xian
    Jiang, Mingfeng
    Zhu, Jinchao
    Shao, Xiuli
    Wang, Hongpeng
    PATTERN RECOGNITION, 2023, 135
  • [46] Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection
    Yu, Fengning
    Lian, Jing
    Li, Linhui
    Zhao, Jian
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [47] Multi-modal object detection via transformer network
    Liu, Wenbing
    Wang, Haibo
    Gao, Quanxue
    Zhu, Zhaorui
    IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550
  • [48] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [49] Deep multi-scale and multi-modal fusion for 3D object detection
    Guo, Rui
    Li, Deng
    Han, Yahong
    PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242
  • [50] FuseNet: a multi-modal feature fusion network for 3D shape classification
    Zhao, Xin
    Chen, Yinhuang
    Yang, Chengzhuan
    Fang, Lincong
    VISUAL COMPUTER, 2025, 41 (04): : 2973 - 2985