RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

被引:0
|
作者
Li, Yunfeng [1 ]
Wang, Bo [1 ]
Sun, Jiuran [1 ]
Wu, Xueyi [1 ]
Li, Ye [1 ]
机构
[1] Harbin Engn Univ, Natl Key Lab Autonomous Marine Vehicle Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-sonar tracking; spatial cross attention; transformer network;
D O I
10.1109/TCSVT.2024.3497214
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Underwater camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has received little attention in previous research. Therefore, this paper introduces a new and challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of the RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87,000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses significant challenges to the currently popular SOT trackers. Second, we propose two RGB-S trackers, which are called SCANet and SCANet-Refine. They include a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer, an attention refinement module, and two independent global integration modules. The spatial cross-attention is used to overcome the problem of spatial misalignment between RGB and sonar images. Third, we propose a SOT data-based RGB-S simulation training method (SRST) to overcome the lack of RGB-S training datasets. It converts RGB images into sonar-like saliency images to construct pseudo-data pairs, enabling the model to learn the semantic structure of RGB-S data. Comprehensive experiments show that the proposed spatial cross-attention effectively achieves the interaction between RGB and sonar modalities, and that SCANet and SCANet-Refine achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/RGBS50.
引用
收藏
页码:2260 / 2275
页数:16
相关论文
共 50 条
  • [31] Cross-Attention Spectral-Spatial Network for Hyperspectral Image Classification
    Yang, Kai
    Sun, Hao
    Zou, Chunbo
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [32] Hyperspectral Image Classification via Cascaded Spatial Cross-Attention Network
    Zhang, Bo
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 899 - 913
  • [33] Word2Pix: Word to Pixel Cross-Attention Transformer in Visual Grounding
    Zhao, Heng
    Zhou, Joey Tianyi
    Ong, Yew-Soon
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1523 - 1533
  • [34] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
    Zhu, Hongming
    Wang, Zeju
    Han, Letong
    Xu, Manxin
    Li, Weiqi
    Liu, Qin
    Liu, Sicong
    Du, Bowen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
  • [35] Learning Cross-Attention Discriminators via Alternating TimeSpace Transformers for Visual Tracking
    Wang, Wuwei
    Zhang, Ke
    Su, Yu
    Wang, Jingyu
    Wang, Qi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15156 - 15169
  • [36] Spatio-spectral Cross-Attention Transformer for Hyperspectral image and Multispectral image fusion
    Qin, Xilei
    Song, Huihui
    Fan, Jiaqing
    Zhang, Kaihua
    REMOTE SENSING LETTERS, 2023, 14 (12) : 1303 - 1314
  • [37] Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
    Wu, Sidi
    Chen, Yizi
    Schindler, Konrad
    Hurni, Lorenz
    31ST ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2023, 2023, : 106 - 114
  • [38] Remote sensing image change detection based on swin transformer and cross-attention mechanism
    Yan, Weidong
    Cao, Li
    Yan, Pei
    Zhu, Chaosheng
    Wang, Mengtian
    EARTH SCIENCE INFORMATICS, 2025, 18 (01)
  • [39] MedTrans: Intelligent Computing for Medical Diagnosis Using Multiscale Cross-Attention Vision Transformer
    Xu, Yang
    Hong, Yuan
    Li, Xinchen
    Hu, Mu
    IEEE ACCESS, 2024, 12 : 146575 - 146586
  • [40] Reducing carbon emissions in the architectural design process via transformer with cross-attention mechanism
    Li, Huadong
    Yang, Xia
    Zhu, Hai Luo
    FRONTIERS IN ECOLOGY AND EVOLUTION, 2023, 11