Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

被引:1
|
作者
Lv, Chengtao [1 ]
Zhou, Xiaofei [1 ]
Wan, Bin [1 ]
Wang, Shuai [2 ,3 ]
Sun, Yaoqi [1 ,3 ]
Zhang, Jiyong [1 ]
Yan, Chenggang [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Sch Commun Engn, Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China
[3] Hangzhou Dianzi Univ, Lishui Inst, Lishui 323000, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Transformers; Semantics; Decoding; Aggregates; Object detection; Fuses; Salient object detection; collaborative spatial attention; feature interaction; Swin transformer; interactive complement; IMAGE; KERNEL;
D O I
10.1109/TCE.2024.3390841
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Salient object detection (SOD) can be applied to consumer electronic area, which can help to identify and locate objects of interest. RGB/RGB-D (depth) salient object detection has achieved great progress in recent years. However, there is a large room for improvement in exploring the complementarity of two-modal information for RGB-T (thermal) SOD. Therefore, this paper proposes a Transformer-based Cross-modal Integration Network (i.e., TCINet) to detect salient objects in RGB-T images, which can properly fuse two-modal features and interactively aggregate two-level features. Our method consists of the siamese Swin Transformer-based encoders, the cross-modal feature fusion (CFF) module, and the interaction-based feature decoding (IFD) block. Here, the CFF module is designed to fuse the complementary information of two-modal features, where the collaborative spatial attention emphasizes salient regions and suppresses background regions of the two-modal features. Furthermore, we deploy the IFD block to aggregate two-level features, including the previous-level fused feature and the current-level encoder feature, where the IFD block bridges the large semantic gap and reduces the noise. Extensive experiments are conducted on three RGB-T datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our method when compared with the cutting-edge saliency methods. The results and code of our method will be available at https://github.com/lvchengtao/TCINet.
引用
收藏
页码:4741 / 4755
页数:15
相关论文
共 50 条
  • [1] Asymmetric cross-modal activation network for RGB-T salient object detection
    Xu, Chang
    Li, Qingwu
    Zhou, Qingkai
    Jiang, Xiongbiao
    Yu, Dabing
    Zhou, Yaqin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [2] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
    Luo, Jincheng
    Li, Yongjun
    Li, Bo
    Zhang, Xinru
    Li, Chaoyue
    Chenjin, Zhimin
    He, Jingyi
    Liang, Yifei
    [J]. NEUROCOMPUTING, 2024, 600
  • [3] Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    [J]. ENTROPY, 2024, 26 (02)
  • [4] CAE-Net: Cross-Modal Attention Enhancement Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Hu, Ji
    Zhang, Jiyong
    Yan, Chenggang
    [J]. ELECTRONICS, 2023, 12 (04)
  • [5] Modal complementary fusion network for RGB-T salient object detection
    Ma, Shuai
    Song, Kechen
    Dong, Hongwen
    Tian, Hongkun
    Yan, Yunhui
    [J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 9038 - 9055
  • [6] Modal complementary fusion network for RGB-T salient object detection
    Shuai Ma
    Kechen Song
    Hongwen Dong
    Hongkun Tian
    Yunhui Yan
    [J]. Applied Intelligence, 2023, 53 : 9038 - 9055
  • [7] Lightweight cross-modal transformer for RGB-D salient object detection
    Huang, Nianchang
    Yang, Yang
    Zhang, Qiang
    Han, Jungong
    Huang, Jin
    [J]. Computer Vision and Image Understanding, 2024, 249
  • [8] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    [J]. NEUROCOMPUTING, 2023, 546
  • [9] Cross-modal collaborative feature representation via Transformer-based multimodal mixers for RGB-T crowd
    Kong, Weihang
    Liu, Jiayu
    Hong, Yao
    Li, He
    Shen, Jienan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [10] Adaptive interactive network for RGB-T salient object detection with double mapping transformer
    Dong, Feng
    Wang, Yuxuan
    Zhu, Jinchao
    Li, Yuehua
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 59169 - 59193