PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection

被引:1
|
作者
Jiang, Mingfeng [1 ]
Ma, Jianhua [1 ]
Chen, Jiatong [1 ]
Wang, Yaming [1 ,2 ]
Fang, Xian [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Mat Sci & Engn, Hangzhou 310018, Peoples R China
[2] Lishui Univ, Zhejiang Key Lab DDIMCCP, Lishui, Peoples R China
基金
中国国家自然科学基金;
关键词
Saliency detection; Multimodal features; Pyramid pooling transformer; Fine-grained features; Attention mechanism;
D O I
10.1016/j.knosys.2024.111597
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal salient object detection (SOD) combines different modal images to generate the most visually appealing saliency map. When fusing multimodal and multiscale features, maintaining the integrity and fine granularity of the target is critical for improving the performance of multimodal SOD. The fine-grained information differences between the modalities and the size of the features in the transformer prevent most existing studies from guaranteeing both granularities. Therefore, we propose a patch -to -pixel attention -aware transformer network (PATNet) to overcome these problems, whereby the integrity and fine-grained details of the saliency map are preserved by employing a decision -transformation strategy to map global patches onto local pixels. Specifically, PATNet consists of the shared attention fusion module (SAFM), adjacent modeling fusion module (AMFM), and fine-grained mapping module (FMM). SAFM enhances the consistency between multimodal features through a shared attention matrix and an identical convolutional feed -forward network. Meanwhile, AMFM enhances low -resolution features by modeling neighboring features to avoid the aliasing effect of upsampling. In the output stage, FMM is responsible for mapping the feature maps represented by patches onto pixels and restoring the salient object details. Numerous experimental results demonstrate that PATNet outperforms 24 state-of-the-art methods on six RGB-D and three RGB-T datasets. The source code is publicly available at https://github.com/LitterMa-820/PATNet.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, Youfu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2825 - 2835
  • [2] SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection
    Liu, Zhengyi
    Tan, Yacheng
    He, Qian
    Xiao, Yun
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4486 - 4497
  • [3] Saliency Prototype for RGB-D and RGB-T Salient Object Detection
    Zhang, Zihao
    Wang, Jie
    Han, Yahong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3696 - 3705
  • [4] Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) : 1787 - 1801
  • [5] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Gao, Lina
    Fu, Ping
    Xu, Mingzhu
    Wang, Tiantian
    Liu, Bing
    [J]. VISUAL COMPUTER, 2024, 40 (03): : 1565 - 1582
  • [6] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [7] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Lina Gao
    Ping Fu
    Mingzhu Xu
    Tiantian Wang
    Bing Liu
    [J]. The Visual Computer, 2024, 40 : 1565 - 1582
  • [8] Bilateral Attention Network for RGB-D Salient Object Detection
    Zhang, Zhao
    Lin, Zheng
    Xu, Jun
    Jin, Wen-Da
    Lu, Shao-Ping
    Fan, Deng-Ping
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1949 - 1961
  • [9] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    [J]. NEUROCOMPUTING, 2023, 546
  • [10] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826