CATNet: A Cascaded and Aggregated Transformer Network for RGB-D Salient Object Detection

被引:19
|
作者
Sun, Fuming [1 ]
Ren, Peng [1 ]
Yin, Bowen [1 ]
Wang, Fasheng [1 ]
Li, Haojie [2 ]
机构
[1] Dalian Minzu Univ, Sch Informat & Commun Engn, Dalian 116620, Peoples R China
[2] Dalian Univ Technol, DUT RU Int Sch Informat Sci & Engn, Dalian 116620, Peoples R China
基金
中国国家自然科学基金;
关键词
Swin Transformer; salient object detection; multi-scale features; attention; decoder;
D O I
10.1109/TMM.2023.3294003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Salient object detection (SOD) is an important preprocessing operation for various computer vision tasks. Most of existing RGB-D SOD models employ additive or connected strategies to directly aggregate and decode multi-scale features to predict salient maps. However, due to the large differences between the features of different scales, these aggregation strategies adopted may lead to information loss or redundancy, and few methods explicitly consider how to establish connections between features at different scales in the decoding process, which consequently deteriorates the detection performance of the models. To this end, we propose a cascaded and aggregated Transformer Network (CATNet) which consists of three key modules, i.e., attention feature enhancement module (AFEM), cross-modal fusion module (CMFM) and cascaded correction decoder (CCD). Specifically, the AFEM is designed on the basis of atrous spatial pyramid pooling to obtain multi-scale semantic information and global context information in high-level features through dilated convolution and multi-head self-attention mechanism, enhancing high-level features. The role of the CMFM is to enhance and thereafter fuse the RGB features and depth features, alleviating the problem of poor-quality depth maps. The CCD is composed of two subdecoders in a cascading fashion. It is designed to suppress noise in low-level features and mitigate the differences between features at different scales. Moreover, the CCD uses a feedback mechanism to correct and repair the output of the subdecoder by exploiting supervised features, so that the problem of information loss caused by the upsampling operation during the multi-scale features aggregation process can be mitigated. Extensive experimental results demonstrate that the proposed CATNet achieves superior performance over 14 state-of-the-art RGB-D methods on 7 challenging benchmarks.
引用
收藏
页码:2249 / 2262
页数:14
相关论文
共 50 条
  • [1] GroupTransNet: Group transformer network for RGB-D salient object detection
    Fang, Xian
    Jiang, Mingfeng
    Zhu, Jinchao
    Shao, Xiuli
    Wang, Hongpeng
    [J]. NEUROCOMPUTING, 2024, 594
  • [2] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
    Liu, Zhengyi
    Wang, Yuan
    Tu, Zhengzheng
    Xiao, Yun
    Tang, Bin
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490
  • [3] A cascaded refined rgb-d salient object detection network based on the attention mechanism
    Zong, Guanyu
    Wei, Longsheng
    Guo, Siyuan
    Wang, Yongtao
    [J]. APPLIED INTELLIGENCE, 2023, 53 (11) : 13527 - 13548
  • [4] A cascaded refined rgb-d salient object detection network based on the attention mechanism
    Guanyu Zong
    Longsheng Wei
    Siyuan Guo
    Yongtao Wang
    [J]. Applied Intelligence, 2023, 53 : 13527 - 13548
  • [5] TANet: Transformer-based asymmetric network for RGB-D salient object detection
    Liu, Chang
    Yang, Gang
    Wang, Shuo
    Wang, Hangxu
    Zhang, Yunhua
    Wang, Yutao
    [J]. IET COMPUTER VISION, 2023, 17 (04) : 415 - 430
  • [6] Transformer-based difference fusion network for RGB-D salient object detection
    Cui, Zhi-Qiang
    Wang, Feng
    Feng, Zheng-Yong
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [7] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhao, Zhengyun
    Huang, Ziqing
    Chai, Xiuli
    Wang, Jun
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (01) : 361 - 384
  • [8] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [9] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhengyun Zhao
    Ziqing Huang
    Xiuli Chai
    Jun Wang
    [J]. Neural Processing Letters, 2023, 55 : 361 - 384
  • [10] AirSOD: A Lightweight Network for RGB-D Salient Object Detection
    Zeng, Zhihong
    Liu, Haijun
    Chen, Fenglei
    Tan, Xiaoheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1656 - 1669