RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

被引：0

作者：

Li, Yunfeng ^{[1
]}

Wang, Bo ^{[1
]}

Sun, Jiuran ^{[1
]}

Wu, Xueyi ^{[1
]}

Li, Ye ^{[1
]}

机构：

[1] Harbin Engn Univ, Natl Key Lab Autonomous Marine Vehicle Technol, Harbin 150001, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

RGB-sonar tracking; spatial cross attention; transformer network;

D O I：

10.1109/TCSVT.2024.3497214

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Underwater camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has received little attention in previous research. Therefore, this paper introduces a new and challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of the RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87,000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses significant challenges to the currently popular SOT trackers. Second, we propose two RGB-S trackers, which are called SCANet and SCANet-Refine. They include a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer, an attention refinement module, and two independent global integration modules. The spatial cross-attention is used to overcome the problem of spatial misalignment between RGB and sonar images. Third, we propose a SOT data-based RGB-S simulation training method (SRST) to overcome the lack of RGB-S training datasets. It converts RGB images into sonar-like saliency images to construct pseudo-data pairs, enabling the model to learn the semantic structure of RGB-S data. Comprehensive experiments show that the proposed spatial cross-attention effectively achieves the interaction between RGB and sonar modalities, and that SCANet and SCANet-Refine achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/RGBS50.

引用

页码：2260 / 2275

页数：16

共 50 条

[21] Anchor free based Siamese network tracker with transformer for RGB-T tracking
Fan, Liangsong
Kim, Pyeoungkee
SCIENTIFIC REPORTS, 2023, 13 (01)
[22] Anchor free based Siamese network tracker with transformer for RGB-T tracking
Liangsong Fan
Pyeoungkee Kim
Scientific Reports, 13
[23] DBCAN: DFormer-Based Cross-Attention Network for RGB Depth Semantic Segmentation
Wu, Aihua
Fu, Liuxu
APPLIED SCIENCES-BASEL, 2024, 14 (18):
[24] Multi-view Cross-Attention Network for Hyperspectral Object Tracking
Zhu, Minghao
Wang, Chongchong
Wang, Heng
Yuan, Shanshan
Song, Lin
Ma, Zongfang
PATTERN RECOGNITION AND COMPUTER VISION, PT XIII, PRCV 2024, 2025, 15043 : 32 - 46
[25] Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification
Xu, Fulin
Mei, Shaohui
Zhang, Ge
Wang, Nan
Du, Qian
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[26] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chen, Chun-Fu
Fan, Quanfu
Panda, Rameswar
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 347 - 356
[27] Audio-Visual Cross-Attention Network for Robotic Speaker Tracking
Qian, Xinyuan
Wang, Zhengdong
Wang, Jiadong
Guan, Guohui
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 550 - 562
[28] A Novel Transformer Network With Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
Bojesomo, Alabi
Almarzouqi, Hasan
Liatsis, Panos
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 45 - 55
[29] Cross-attention Based Text-image Transformer for Visual Question Answering
Rezapour M.
Recent Advances in Computer Science and Communications, 2024, 17 (04) : 72 - 78
[30] Dual Cross-Attention Transformer Networks for Temporal Predictive Modeling of Industrial Process
Wang, Jie
Xie, Yongfang
Xie, Shiwen
Chen, Xiaofang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11

← 1 2 3 4 5 →