PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection

被引：1

作者：

Jiang, Mingfeng ^{[1
]}

Ma, Jianhua ^{[1
]}

Chen, Jiatong ^{[1
]}

Wang, Yaming ^{[1
,2
]}

Fang, Xian ^{[1
]}

机构：

[1] Zhejiang Sci Tech Univ, Sch Mat Sci & Engn, Hangzhou 310018, Peoples R China

[2] Lishui Univ, Zhejiang Key Lab DDIMCCP, Lishui, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 291卷

基金：

中国国家自然科学基金;

关键词：

Saliency detection; Multimodal features; Pyramid pooling transformer; Fine-grained features; Attention mechanism;

D O I：

10.1016/j.knosys.2024.111597

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal salient object detection (SOD) combines different modal images to generate the most visually appealing saliency map. When fusing multimodal and multiscale features, maintaining the integrity and fine granularity of the target is critical for improving the performance of multimodal SOD. The fine-grained information differences between the modalities and the size of the features in the transformer prevent most existing studies from guaranteeing both granularities. Therefore, we propose a patch -to -pixel attention -aware transformer network (PATNet) to overcome these problems, whereby the integrity and fine-grained details of the saliency map are preserved by employing a decision -transformation strategy to map global patches onto local pixels. Specifically, PATNet consists of the shared attention fusion module (SAFM), adjacent modeling fusion module (AMFM), and fine-grained mapping module (FMM). SAFM enhances the consistency between multimodal features through a shared attention matrix and an identical convolutional feed -forward network. Meanwhile, AMFM enhances low -resolution features by modeling neighboring features to avoid the aliasing effect of upsampling. In the output stage, FMM is responsible for mapping the feature maps represented by patches onto pixels and restoring the salient object details. Numerous experimental results demonstrate that PATNet outperforms 24 state-of-the-art methods on six RGB-D and three RGB-T datasets. The source code is publicly available at https://github.com/LitterMa-820/PATNet.

引用

页数：13

共 50 条

[1] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
Chen, Hao
Li, Youfu
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2825 - 2835
[2] SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection
Liu, Zhengyi
Tan, Yacheng
He, Qian
Xiao, Yun
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4486 - 4497
[3] Saliency Prototype for RGB-D and RGB-T Salient Object Detection
Zhang, Zihao
Wang, Jie
Han, Yahong
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3696 - 3705
[4] Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection
Chen, Gang
Shao, Feng
Chai, Xiongli
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) : 1787 - 1801
[5] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
Gao, Lina
Fu, Ping
Xu, Mingzhu
Wang, Tiantian
Liu, Bing
[J]. VISUAL COMPUTER, 2024, 40 (03): : 1565 - 1582
[6] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[7] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
Lina Gao
Ping Fu
Mingzhu Xu
Tiantian Wang
Bing Liu
[J]. The Visual Computer, 2024, 40 : 1565 - 1582
[8] Bilateral Attention Network for RGB-D Salient Object Detection
Zhang, Zhao
Lin, Zheng
Xu, Jun
Jin, Wen-Da
Lu, Shao-Ping
Fan, Deng-Ping
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1949 - 1961
[9] Feature aggregation with transformer for RGB-T salient object detection
Zhang, Ping
Xu, Mengnan
Zhang, Ziyan
Gao, Pan
Zhang, Jing
[J]. NEUROCOMPUTING, 2023, 546
[10] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
Chen, Hao
Li, You-Fu
Su, Dan
[J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826

← 1 2 3 4 5 →