RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

被引:0
|
作者
Li, Yunfeng [1 ]
Wang, Bo [1 ]
Sun, Jiuran [1 ]
Wu, Xueyi [1 ]
Li, Ye [1 ]
机构
[1] Harbin Engn Univ, Natl Key Lab Autonomous Marine Vehicle Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-sonar tracking; spatial cross attention; transformer network;
D O I
10.1109/TCSVT.2024.3497214
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Underwater camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has received little attention in previous research. Therefore, this paper introduces a new and challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of the RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87,000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses significant challenges to the currently popular SOT trackers. Second, we propose two RGB-S trackers, which are called SCANet and SCANet-Refine. They include a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer, an attention refinement module, and two independent global integration modules. The spatial cross-attention is used to overcome the problem of spatial misalignment between RGB and sonar images. Third, we propose a SOT data-based RGB-S simulation training method (SRST) to overcome the lack of RGB-S training datasets. It converts RGB images into sonar-like saliency images to construct pseudo-data pairs, enabling the model to learn the semantic structure of RGB-S data. Comprehensive experiments show that the proposed spatial cross-attention effectively achieves the interaction between RGB and sonar modalities, and that SCANet and SCANet-Refine achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/RGBS50.
引用
收藏
页码:2260 / 2275
页数:16
相关论文
共 50 条
  • [21] Anchor free based Siamese network tracker with transformer for RGB-T tracking
    Fan, Liangsong
    Kim, Pyeoungkee
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] Anchor free based Siamese network tracker with transformer for RGB-T tracking
    Liangsong Fan
    Pyeoungkee Kim
    Scientific Reports, 13
  • [23] DBCAN: DFormer-Based Cross-Attention Network for RGB Depth Semantic Segmentation
    Wu, Aihua
    Fu, Liuxu
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [24] Multi-view Cross-Attention Network for Hyperspectral Object Tracking
    Zhu, Minghao
    Wang, Chongchong
    Wang, Heng
    Yuan, Shanshan
    Song, Lin
    Ma, Zongfang
    PATTERN RECOGNITION AND COMPUTER VISION, PT XIII, PRCV 2024, 2025, 15043 : 32 - 46
  • [25] Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification
    Xu, Fulin
    Mei, Shaohui
    Zhang, Ge
    Wang, Nan
    Du, Qian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [26] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
    Chen, Chun-Fu
    Fan, Quanfu
    Panda, Rameswar
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 347 - 356
  • [27] Audio-Visual Cross-Attention Network for Robotic Speaker Tracking
    Qian, Xinyuan
    Wang, Zhengdong
    Wang, Jiadong
    Guan, Guohui
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 550 - 562
  • [28] A Novel Transformer Network With Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
    Bojesomo, Alabi
    Almarzouqi, Hasan
    Liatsis, Panos
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 45 - 55
  • [29] Cross-attention Based Text-image Transformer for Visual Question Answering
    Rezapour M.
    Recent Advances in Computer Science and Communications, 2024, 17 (04) : 72 - 78
  • [30] Dual Cross-Attention Transformer Networks for Temporal Predictive Modeling of Industrial Process
    Wang, Jie
    Xie, Yongfang
    Xie, Shiwen
    Chen, Xiaofang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11