CATNet: A Cascaded and Aggregated Transformer Network for RGB-D Salient Object Detection

被引:19
|
作者
Sun, Fuming [1 ]
Ren, Peng [1 ]
Yin, Bowen [1 ]
Wang, Fasheng [1 ]
Li, Haojie [2 ]
机构
[1] Dalian Minzu Univ, Sch Informat & Commun Engn, Dalian 116620, Peoples R China
[2] Dalian Univ Technol, DUT RU Int Sch Informat Sci & Engn, Dalian 116620, Peoples R China
基金
中国国家自然科学基金;
关键词
Swin Transformer; salient object detection; multi-scale features; attention; decoder;
D O I
10.1109/TMM.2023.3294003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Salient object detection (SOD) is an important preprocessing operation for various computer vision tasks. Most of existing RGB-D SOD models employ additive or connected strategies to directly aggregate and decode multi-scale features to predict salient maps. However, due to the large differences between the features of different scales, these aggregation strategies adopted may lead to information loss or redundancy, and few methods explicitly consider how to establish connections between features at different scales in the decoding process, which consequently deteriorates the detection performance of the models. To this end, we propose a cascaded and aggregated Transformer Network (CATNet) which consists of three key modules, i.e., attention feature enhancement module (AFEM), cross-modal fusion module (CMFM) and cascaded correction decoder (CCD). Specifically, the AFEM is designed on the basis of atrous spatial pyramid pooling to obtain multi-scale semantic information and global context information in high-level features through dilated convolution and multi-head self-attention mechanism, enhancing high-level features. The role of the CMFM is to enhance and thereafter fuse the RGB features and depth features, alleviating the problem of poor-quality depth maps. The CCD is composed of two subdecoders in a cascading fashion. It is designed to suppress noise in low-level features and mitigate the differences between features at different scales. Moreover, the CCD uses a feedback mechanism to correct and repair the output of the subdecoder by exploiting supervised features, so that the problem of information loss caused by the upsampling operation during the multi-scale features aggregation process can be mitigated. Extensive experimental results demonstrate that the proposed CATNet achieves superior performance over 14 state-of-the-art RGB-D methods on 7 challenging benchmarks.
引用
收藏
页码:2249 / 2262
页数:14
相关论文
共 50 条
  • [41] Salient object detection for RGB-D images by generative adversarial network
    Zhengyi Liu
    Jiting Tang
    Qian Xiang
    Peng Zhao
    [J]. Multimedia Tools and Applications, 2020, 79 : 25403 - 25425
  • [42] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
    Chen, Hao
    Shen, Feihong
    Ding, Ding
    Deng, Yongjian
    Li, Chao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709
  • [43] TSVT: Token Sparsification Vision Transformer for robust RGB-D salient object detection
    Gao, Lina
    Liu, Bing
    Fu, Ping
    Xu, Mingzhu
    [J]. PATTERN RECOGNITION, 2024, 148
  • [44] DVSOD: RGB-D Video Salient Object Detection
    Li, Jingjing
    Ji, Wei
    Wang, Size
    Li, Wenbo
    Cheng, Li
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [45] Advancing in RGB-D Salient Object Detection: A Survey
    Chen, Ai
    Li, Xin
    He, Tianxiang
    Zhou, Junlin
    Chen, Duanbing
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [46] Adaptive Fusion for RGB-D Salient Object Detection
    Wang, Ningning
    Gong, Xiaojin
    [J]. IEEE ACCESS, 2019, 7 : 55277 - 55284
  • [47] AFLNet: Adversarial focal loss network for RGB-D salient object detection
    Zhao, Xiaoli
    Chen, Zheng
    Hwang, Jenq-Neng
    Shang, Xiwu
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 94
  • [48] SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection
    Lee, Minhyeok
    Park, Chaewon
    Cho, Suhwan
    Lee, Sangyoun
    [J]. COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 630 - 647
  • [49] Perceptual localization and focus refinement network for RGB-D salient object detection
    Han, Jinyu
    Wang, Mengyin
    Wu, Weiyi
    Jia, Xu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2025, 259
  • [50] Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection
    Gao, Haorao
    Su, Yiming
    Wang, Fasheng
    Li, Haojie
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)