Adaptive sparse attention-based compact transformer for object tracking

被引:0
|
作者
Pan, Fei [1 ]
Zhao, Lianyu [1 ]
Wang, Chenglin [2 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Liqizhaung St, Tianjin 300384, Peoples R China
[2] Tianjin Univ Technol, Sch Mech Engn, Liqizhaung St, Tianjin 300384, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Object tracking; Siamese network; Transformer; Adaptive sparse attention;
D O I
10.1038/s41598-024-63028-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Transformer-based Siamese networks have excelled in the field of object tracking. Nevertheless, a notable limitation persists in their reliance on ResNet as backbone, which lacks the capacity to effectively capture global information and exhibits constraints in feature representation. Furthermore, these trackers struggle to effectively attend to target-relevant information within the search region using multi-head self-attention (MSA). Additionally, they are prone to robustness challenges during online tracking and tend to exhibit significant model complexity. To address these limitations, We propose a novel tracker named ASACTT, which includes a backbone network, feature fusion network and prediction head. First, we improve the Swin-Transformer-Tiny to enhance its global information extraction capabilities. Second, we propose an adaptive sparse attention (ASA) to focus on target-specific details within the search region. Third, we leverage position encoding and historical candidate data to develop a dynamic template updater (DTU), which ensures the preservation of the initial frame's integrity while gracefully adapting to variations in the target's appearance. Finally, we optimize the network model to maintain accuracy while minimizing complexity. To verify the effectiveness of our proposed tracker, ASACTT, experiments on five benchmark datasets demonstrated that the proposed tracker was highly comparable to other state-of-the-art methods. Notably, in the GOT-10K 1 evaluation, our tracker achieved an outstanding success score of 75.3% at 36 FPS, significantly surpassing other trackers with comparable model parameters.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Visual Tracking Based on the Adaptive Color Attention Tuned Sparse Generative Object Model
    Tian, Chunna
    Gao, Xinbo
    Wei, Wei
    Zheng, Hong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5236 - 5248
  • [2] Attention-based adaptive structured continuous sparse network pruning
    Liu, Jiaxin
    Liu, Wei
    Li, Yongming
    Hu, Jun
    Cheng, Shuai
    Yang, Wenxing
    NEUROCOMPUTING, 2024, 590
  • [3] AFMtrack: Attention-Based Feature Matching for Multiple Object Tracking
    Cuong Bui, Duy
    Anh Hoang, Hiep
    Yoo, Myungsik
    IEEE ACCESS, 2024, 12 : 82897 - 82910
  • [4] Transformer visual object tracking algorithm based on mixed attention
    Hou Z.-Q.
    Guo F.
    Yang X.-L.
    Ma S.-G.
    Fan J.-L.
    Kongzhi yu Juece/Control and Decision, 2024, 39 (03): : 739 - 748
  • [5] Channel and spatial attention-based Siamese network for visual object tracking
    Tian, Shishun
    Chen, Zixi
    Chen, Bolin
    Zou, Wenbin
    Li, Xia
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (03)
  • [6] Adaptive object tracking based on spatial attention mechanism
    Xie Y.
    Chen Y.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2019, 41 (09): : 1945 - 1954
  • [7] Sparse Transformer-Based Sequence Generation for Visual Object Tracking
    Tian, Dan
    Liu, Dong-Xin
    Wang, Xiao
    Hao, Ying
    IEEE Access, 2024, 12 : 154418 - 154425
  • [8] Highly compact adaptive network based on transformer for RGBT tracking
    Chen, Siqing
    Gao, Pan
    Wang, Xun
    Liao, Kuo
    Zhang, Ping
    INFRARED PHYSICS & TECHNOLOGY, 2024, 139
  • [9] Salient Feature Enhanced Multi-object Tracking with Soft-Sparse Attention in Transformer
    Liu, Caihua
    Qu, Xu
    Ma, Xiaoyi
    Li, Runze
    Li, Xu
    Chen, Sichu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 392 - 404
  • [10] Efficient Siamese model for visual object tracking with attention-based fusion modules
    Zhou, Wenjun
    Liu, Yao
    Wang, Nan
    Liang, Dong
    Peng, Bo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, : 7801 - 7810