CATNet: A Cascaded and Aggregated Transformer Network for RGB-D Salient Object Detection

被引：19

作者：

Sun, Fuming ^{[1
]}

Ren, Peng ^{[1
]}

Yin, Bowen ^{[1
]}

Wang, Fasheng ^{[1
]}

Li, Haojie ^{[2
]}

机构：

[1] Dalian Minzu Univ, Sch Informat & Commun Engn, Dalian 116620, Peoples R China

[2] Dalian Univ Technol, DUT RU Int Sch Informat Sci & Engn, Dalian 116620, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Swin Transformer; salient object detection; multi-scale features; attention; decoder;

D O I：

10.1109/TMM.2023.3294003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Salient object detection (SOD) is an important preprocessing operation for various computer vision tasks. Most of existing RGB-D SOD models employ additive or connected strategies to directly aggregate and decode multi-scale features to predict salient maps. However, due to the large differences between the features of different scales, these aggregation strategies adopted may lead to information loss or redundancy, and few methods explicitly consider how to establish connections between features at different scales in the decoding process, which consequently deteriorates the detection performance of the models. To this end, we propose a cascaded and aggregated Transformer Network (CATNet) which consists of three key modules, i.e., attention feature enhancement module (AFEM), cross-modal fusion module (CMFM) and cascaded correction decoder (CCD). Specifically, the AFEM is designed on the basis of atrous spatial pyramid pooling to obtain multi-scale semantic information and global context information in high-level features through dilated convolution and multi-head self-attention mechanism, enhancing high-level features. The role of the CMFM is to enhance and thereafter fuse the RGB features and depth features, alleviating the problem of poor-quality depth maps. The CCD is composed of two subdecoders in a cascading fashion. It is designed to suppress noise in low-level features and mitigate the differences between features at different scales. Moreover, the CCD uses a feedback mechanism to correct and repair the output of the subdecoder by exploiting supervised features, so that the problem of information loss caused by the upsampling operation during the multi-scale features aggregation process can be mitigated. Extensive experimental results demonstrate that the proposed CATNet achieves superior performance over 14 state-of-the-art RGB-D methods on 7 challenging benchmarks.

引用

页码：2249 / 2262

页数：14

共 50 条

[41] Salient object detection for RGB-D images by generative adversarial network
Zhengyi Liu
Jiting Tang
Qian Xiang
Peng Zhao
[J]. Multimedia Tools and Applications, 2020, 79 : 25403 - 25425
[42] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
Chen, Hao
Shen, Feihong
Ding, Ding
Deng, Yongjian
Li, Chao
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709
[43] TSVT: Token Sparsification Vision Transformer for robust RGB-D salient object detection
Gao, Lina
Liu, Bing
Fu, Ping
Xu, Mingzhu
[J]. PATTERN RECOGNITION, 2024, 148
[44] DVSOD: RGB-D Video Salient Object Detection
Li, Jingjing
Ji, Wei
Wang, Size
Li, Wenbo
Cheng, Li
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[45] Advancing in RGB-D Salient Object Detection: A Survey
Chen, Ai
Li, Xin
He, Tianxiang
Zhou, Junlin
Chen, Duanbing
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (17):
[46] Adaptive Fusion for RGB-D Salient Object Detection
Wang, Ningning
Gong, Xiaojin
[J]. IEEE ACCESS, 2019, 7 : 55277 - 55284
[47] AFLNet: Adversarial focal loss network for RGB-D salient object detection
Zhao, Xiaoli
Chen, Zheng
Hwang, Jenq-Neng
Shang, Xiwu
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 94
[48] SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection
Lee, Minhyeok
Park, Chaewon
Cho, Suhwan
Lee, Sangyoun
[J]. COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 630 - 647
[49] Perceptual localization and focus refinement network for RGB-D salient object detection
Han, Jinyu
Wang, Mengyin
Wu, Weiyi
Jia, Xu
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2025, 259
[50] Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection
Gao, Haorao
Su, Yiming
Wang, Fasheng
Li, Haojie
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)

← 1 2 3 4 5 →