SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection

被引:163
|
作者
Liu, Zhengyi [1 ]
Tan, Yacheng [1 ]
He, Qian [1 ]
Xiao, Yun [2 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China
[2] Anhui Univ, Sch Artificial Intelligence, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; salient object detection; RGB-D; RGB-T; multi-modality; NETWORK; IMAGE; MODEL;
D O I
10.1109/TCSVT.2021.3127149
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a crass-modality fusion model, SwinNet, for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fumy boundary, edge-guided decoder achieves interlevel cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarily task.
引用
收藏
页码:4486 / 4497
页数:12
相关论文
共 50 条
  • [1] EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection
    Chen, Geng
    Wang, Qingyue
    Dong, Bo
    Ma, Ruitao
    Liu, Nian
    Fu, Huazhu
    Xia, Yong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [2] Saliency Prototype for RGB-D and RGB-T Salient Object Detection
    Zhang, Zihao
    Wang, Jie
    Han, Yahong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3696 - 3705
  • [3] PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection
    Jiang, Mingfeng
    Ma, Jianhua
    Chen, Jiatong
    Wang, Yaming
    Fang, Xian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [4] Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection
    Wang, Shuaihui
    Jiang, Fengyi
    Xu, Boqian
    [J]. SENSORS, 2023, 23 (21)
  • [5] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    [J]. NEUROCOMPUTING, 2023, 546
  • [6] Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) : 1787 - 1801
  • [7] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Gao, Lina
    Fu, Ping
    Xu, Mingzhu
    Wang, Tiantian
    Liu, Bing
    [J]. VISUAL COMPUTER, 2024, 40 (03): : 1565 - 1582
  • [8] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [9] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Lina Gao
    Ping Fu
    Mingzhu Xu
    Tiantian Wang
    Bing Liu
    [J]. The Visual Computer, 2024, 40 : 1565 - 1582
  • [10] Dual Swin-transformer based mutual interactive network for RGB-D salient object detection
    Zeng, Chao
    Kwong, Sam
    Ip, Horace
    [J]. NEUROCOMPUTING, 2023, 559