UTDNet: A unified triplet decoder network for multimodal salient object detection

被引:0
|
作者
Huo, Fushuo [1 ]
Liu, Ziming [1 ]
Guo, Jingcai [1 ]
Xu, Wenchao [1 ]
Guo, Song [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
关键词
Salient object detection; Multi-modal fusion; Unified model;
D O I
10.1016/j.neunet.2023.11.051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image Salient Object Detection (SOD) is a fundamental research topic in the area of computer vision. Recently, the multimodal information in RGB, Depth (D), and Thermal (T) modalities has been proven to be beneficial to the SOD. However, existing methods are only designed for RGB-D or RGB-T SOD, which may limit the utilization in various modalities, or just finetuned on specific datasets, which may bring about extra computation overhead. These defects can hinder the practical deployment of SOD in real-world applications. In this paper, we propose an end-to-end Unified Triplet Decoder Network, dubbed UTDNet, for both RGB-T and RGB-D SOD tasks. The intractable challenges for the unified multimodal SOD are mainly two-fold, i.e., (1) accurately detecting and segmenting salient objects, and (2) preferably via a single network that fits both RGB-T and RGB-D SOD. First, to deal with the former challenge, we propose the multi-scale feature extraction unit to enrich the discriminative contextual information, and the efficient fusion module to explore cross-modality complementary information. Then, the multimodal features are fed to the triplet decoder, where the hierarchical deep supervision loss further enable the network to capture distinctive saliency cues. Second, as to the latter challenge, we propose a simple yet effective continual learning method to unify multimodal SOD. Concretely, we sequentially train multimodal SOD tasks by applying Elastic Weight Consolidation (EWC) regularization with the hierarchical loss function to avoid catastrophic forgetting without inducing more parameters. Critically, the triplet decoder separates task-specific and task-invariant information, making the network easily adaptable to multimodal SOD tasks. Extensive comparisons with 26 recently proposed RGB-T and RGB-D SOD methods demonstrate the superiority of the proposed UTDNet.
引用
收藏
页码:521 / 534
页数:14
相关论文
共 50 条
  • [1] UTDNet: A unified triplet decoder network for multimodal salient object detection
    Huo, Fushuo
    Liu, Ziming
    Guo, Jingcai
    Xu, Wenchao
    Guo, Song
    [J]. Neural Networks, 2024, 170 : 521 - 534
  • [2] Towards salient object detection via parallel dual-decoder network
    Cen, Chaojun
    Li, Fei
    Li, Zhenbo
    Wang, Yun
    [J]. Engineering Applications of Artificial Intelligence, 2025, 139
  • [3] Multi-scale deep encoder-decoder network for salient object detection
    Ren, Qinghua
    Hu, Renjie
    [J]. NEUROCOMPUTING, 2018, 316 : 95 - 104
  • [4] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
    Liu, Zhengyi
    Wang, Yuan
    Tu, Zhengzheng
    Xiao, Yun
    Tang, Bin
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490
  • [5] Attention-based pyramid decoder network for salient object detection in remote sensing images
    Liu, Yu
    Lin, Jie
    Yue, Gongtao
    Shao, Zhaosheng
    Zhang, Shanwen
    [J]. JOURNAL OF APPLIED REMOTE SENSING, 2022, 16 (04)
  • [6] Three-stream interaction decoder network for RGB-thermal salient object detection
    Huo, Fushuo
    Zhu, Xuegui
    Li, Bingheng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [7] Cascaded Partial Decoder for Fast and Accurate Salient Object Detection
    Wu, Zhe
    Su, Li
    Huang, Qingming
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3902 - 3911
  • [8] Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection
    Zhao, Zhirui
    Xia, Changqun
    Xie, Chenxi
    Li, Jia
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4967 - 4975
  • [9] HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection
    Luo, Yi
    Shao, Feng
    Xie, Zhengxuan
    Wang, Huizhi
    Chen, Hangwei
    Mu, Baoyang
    Jiang, Qiuping
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 15
  • [10] A deep multimodal feature learning network for RGB-D salient object detection
    Liang, Fangfang
    Duan, Lijuan
    Ma, Wei
    Qiao, Yuanhua
    Miao, Jun
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92 (92)