RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

被引:0
|
作者
Li, Yunfeng [1 ]
Wang, Bo [1 ]
Sun, Jiuran [1 ]
Wu, Xueyi [1 ]
Li, Ye [1 ]
机构
[1] Harbin Engn Univ, Natl Key Lab Autonomous Marine Vehicle Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-sonar tracking; spatial cross attention; transformer network;
D O I
10.1109/TCSVT.2024.3497214
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Underwater camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has received little attention in previous research. Therefore, this paper introduces a new and challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of the RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87,000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses significant challenges to the currently popular SOT trackers. Second, we propose two RGB-S trackers, which are called SCANet and SCANet-Refine. They include a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer, an attention refinement module, and two independent global integration modules. The spatial cross-attention is used to overcome the problem of spatial misalignment between RGB and sonar images. Third, we propose a SOT data-based RGB-S simulation training method (SRST) to overcome the lack of RGB-S training datasets. It converts RGB images into sonar-like saliency images to construct pseudo-data pairs, enabling the model to learn the semantic structure of RGB-S data. Comprehensive experiments show that the proposed spatial cross-attention effectively achieves the interaction between RGB and sonar modalities, and that SCANet and SCANet-Refine achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/RGBS50.
引用
收藏
页码:2260 / 2275
页数:16
相关论文
共 50 条
  • [41] DCCAT: Dual-Coordinate Cross-Attention Transformer for thrombus segmentation on coronary OCT
    Chu, Miao
    De Maria, Giovanni Luigi
    Dai, Ruobing
    Benenati, Stefano
    Yu, Wei
    Zhong, Jiaxin
    Kotronias, Rafail
    Walsh, Jason
    Andreaggi, Stefano
    Zuccarelli, Vittorio
    Chai, Jason
    Channon, Keith
    Banning, Adrian
    Tu, Shengxian
    MEDICAL IMAGE ANALYSIS, 2024, 97
  • [42] Dual cross-attention Transformer network for few-shot image semantic segmentation
    Liu, Yu
    Guo, Yingchun
    Zhu, Ye
    Yu, Ming
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (11) : 1494 - 1505
  • [43] Twins transformer: Cross-attention based two-branch transformer network for rotating bearing fault diagnosis
    Li, Jie
    Bao, Yu
    Liu, Wenxin
    Ji, Pengxiang
    Wang, Lekang
    Wang, Zhongbing
    MEASUREMENT, 2023, 223
  • [44] Multi-level Cross-attention Siamese Network For Visual Object Tracking
    Zhang, Jianwei
    Wang, Jingchao
    Zhang, Huanlong
    Miao, Mengen
    Cai, Zengyu
    Chen, Fuguo
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (12): : 3976 - 3990
  • [45] Invisible gas detection: An RGB-thermal cross attention network and a new benchmark
    Wang, Jue
    Lin, Yuxiang
    Zhao, Qi
    Luo, Dong
    Chen, Shuaibao
    Chen, Wei
    Peng, Xiaojiang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248
  • [46] DTCA: Dual-Branch Transformer with Cross-Attention for EEG and Eye Movement Data Fusion
    Zhang, Xiaoshan
    Shi, Enze
    Yu, Sigang
    Zhang, Shu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT II, 2024, 15002 : 141 - 151
  • [47] CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection
    Lin, Wei-Dong
    Deng, Yu-Yan
    Gao, Yang
    Wang, Ning
    Liu, Ling-Qiao
    Zhang, Lei
    Wang, Peng
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (02) : 460 - 471
  • [48] Cross-Attention Fusion Learning of Transformer-CNN Features for Person Re-Identification
    Xiang, Jun
    Zhang, Jincheng
    Jiang, Xiaoping
    Hou, Jianhua
    Computer Engineering and Applications, 2024, 60 (16) : 94 - 104
  • [49] CAF-ViT: A cross-attention based Transformer network for underwater acoustic target recognition
    Dong, Wenfeng
    Fu, Jin
    Zou, Nan
    Zhao, Chunpeng
    Miao, Yixin
    Shen, Zheng
    OCEAN ENGINEERING, 2025, 318
  • [50] KMT-PLL: K-Means Cross-Attention Transformer for Partial Label Learning
    Fan, Jinfu
    Huang, Linqing
    Gong, Chaoyu
    You, Yang
    Gan, Min
    Wang, Zhongjie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 2789 - 2800