RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

被引:0
|
作者
Li, Yunfeng [1 ]
Wang, Bo [1 ]
Sun, Jiuran [1 ]
Wu, Xueyi [1 ]
Li, Ye [1 ]
机构
[1] Harbin Engn Univ, Natl Key Lab Autonomous Marine Vehicle Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-sonar tracking; spatial cross attention; transformer network;
D O I
10.1109/TCSVT.2024.3497214
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Underwater camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has received little attention in previous research. Therefore, this paper introduces a new and challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of the RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87,000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses significant challenges to the currently popular SOT trackers. Second, we propose two RGB-S trackers, which are called SCANet and SCANet-Refine. They include a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer, an attention refinement module, and two independent global integration modules. The spatial cross-attention is used to overcome the problem of spatial misalignment between RGB and sonar images. Third, we propose a SOT data-based RGB-S simulation training method (SRST) to overcome the lack of RGB-S training datasets. It converts RGB images into sonar-like saliency images to construct pseudo-data pairs, enabling the model to learn the semantic structure of RGB-S data. Comprehensive experiments show that the proposed spatial cross-attention effectively achieves the interaction between RGB and sonar modalities, and that SCANet and SCANet-Refine achieves state-of-the-art performance on the proposed benchmark. The code is available at https://github.com/LiYunfengLYF/RGBS50.
引用
收藏
页码:2260 / 2275
页数:16
相关论文
共 50 条
  • [1] SCATT: Transformer tracking with symmetric cross-attention
    Zhang, Jianming
    Chen, Wentao
    Dai, Jiangxin
    Zhang, Jin
    APPLIED INTELLIGENCE, 2024, 54 (08) : 6069 - 6084
  • [2] Deblurring transformer tracking with conditional cross-attention
    Sun, Fuming
    Zhao, Tingting
    Zhu, Bing
    Jia, Xu
    Wang, Fasheng
    MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1131 - 1144
  • [3] Deblurring transformer tracking with conditional cross-attention
    Fuming Sun
    Tingting Zhao
    Bing Zhu
    Xu Jia
    Fasheng Wang
    Multimedia Systems, 2023, 29 : 1131 - 1144
  • [4] Cross-Attention Transformer for Video Interpolation
    Kim, Hannah Halin
    Yu, Shuzhi
    Yuan, Shuai
    Tomasi, Carlo
    COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 325 - 342
  • [5] Spatial-Spectral Transformer With Cross-Attention for Hyperspectral Image Classification
    Peng, Yishu
    Zhang, Yuwen
    Tu, Bing
    Li, Qianming
    Li, Wujing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [6] Spatial Cross-Attention RGB-D Fusion Module for Object Detection
    Gao, Shangyin
    Markhasin, Lev
    Wang, Bi
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [7] An efficient object tracking based on multi-head cross-attention transformer
    Dai, Jiahai
    Li, Huimin
    Jiang, Shan
    Yang, Hongwei
    EXPERT SYSTEMS, 2025, 42 (02)
  • [8] Deformable Cross-Attention Transformer for Medical Image Registration
    Chen, Junyu
    Liu, Yihao
    He, Yufan
    Du, Yong
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 115 - 125
  • [9] CAR-Transformer: Cross-Attention Reinforcement Transformer for Cross-Lingual Summarization
    Cai, Yuang
    Yuan, Yuyu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17718 - 17726
  • [10] Two-stream cross-attention vision Transformer based on RGB-D images for pig weight estimation
    He, Wei
    Mi, Yang
    Ding, Xiangdong
    Liu, Gang
    Li, Tao
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 212