SCATT: Transformer tracking with symmetric cross-attention

被引:1
|
作者
Zhang, Jianming [1 ]
Chen, Wentao [1 ]
Dai, Jiangxin [1 ]
Zhang, Jin [1 ,2 ]
机构
[1] Changsha Univ Sci & Technol, Sch Comp & Commun Engn, Changsha 410076, Peoples R China
[2] Zhejiang Univ, State Key Lab Ind Control & Technol, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual tracking; Transformer; Symmetric cross-attention; Position information enhancement; VISUAL TRACKING; OBJECT TRACKING;
D O I
10.1007/s10489-024-05467-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the popular Siamese network tracker, cross-correlation is based on the similarity to find the exact location of the template in the search region. However, due to cross-correlation primarily focuses on the spatial neighborhoods, so it often falls into local optimum. Additionally, multiple fusions of features results in a degrade of the target position information. To address these issues, we purpose a novel transformer-variant tracker. We make cross-attention play a central role in our tracker, and thus propose a novel symmetric cross-attention that effectively fuses the features of the template and the search region. The symmetric cross-attention only uses the cross-attention mechanism so as to get rid of the cross-correlation operation, which avoids local optimum and captures more global information. We also propose a position information enhancement module preserving more horizontal and vertical position information, which avoids the loss of position information caused by multiple fusions of features and helps the tracker to locate the target more accurately. Our proposed tracker achieves state-of-the-art performance on six benchmarks including GOT-10k, TrackingNet, LaSOT, UAV123, OTB100, and VOT2020, and is able to run at real-time speed.
引用
收藏
页码:6069 / 6084
页数:16
相关论文
共 50 条
  • [41] A cross-attention integrated shifted window transformer for remote sensing image scene recognition with limited data
    Li, Kaiyuan
    Xue, Yong
    Zhao, Jiaqi
    Li, Honghao
    Zhang, Sheng
    [J]. Journal of Applied Remote Sensing, 2024, 18 (03)
  • [42] Learning Cross-Attention Discriminators via Alternating Time-Space Transformers for Visual Tracking
    Wang, Wuwei
    Zhang, Ke
    Su, Yu
    Wang, Jingyu
    Wang, Qi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 14
  • [43] Dual Cross-Attention for medical image segmentation
    Ates, Gorkem Can
    Mohan, Prasoon
    Celik, Emrah
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [44] Iterative Geographic Entity Alignment with Cross-Attention
    Dsouza, Alishiba
    Yu, Ran
    Windoffer, Moritz
    Demidova, Elena
    [J]. SEMANTIC WEB, ISWC 2023, PART I, 2023, 14265 : 216 - 233
  • [45] Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking
    Deng, Anping
    Han, Guangliang
    Zhang, Zhongbo
    Chen, Dianbing
    Ma, Tianjiao
    Liu, Zhichao
    [J]. REMOTE SENSING, 2024, 16 (06)
  • [46] Cross-Attention Regression Flow for Defect Detection
    Liu, Binhui
    Guo, Tianchu
    Luo, Bin
    Cui, Zhen
    Yang, Jian
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5183 - 5193
  • [47] AiATrack: Attention in Attention for Transformer Visual Tracking
    Gao, Shenyuan
    Zhou, Chunluan
    Ma, Chao
    Wang, Xinggang
    Yuan, Junsong
    [J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 146 - 164
  • [48] Accurate Multi-contrast MRI Super-Resolution via a Dual Cross-Attention Transformer Network
    Huang, Shoujin
    Li, Jingyu
    Mei, Lifeng
    Zhang, Tan
    Chen, Ziran
    Dong, Yu
    Dong, Linzheng
    Liu, Shaojun
    Lyu, Mengye
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 313 - 322
  • [49] Wild Terrestrial Animal Re-Identification Based on an Improved Locally Aware Transformer with a Cross-Attention Mechanism
    Zheng, Zhaoxiang
    Zhao, Yaqin
    Li, Ao
    Yu, Qiuping
    [J]. ANIMALS, 2022, 12 (24):
  • [50] CerviFormer: A pap smear-based cervical cancer classification method using cross-attention and latent transformer
    Deo, Bhaswati Singha
    Pal, Mayukha
    Panigrahi, Prasanta K.
    Pradhan, Asima
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)