A robust attention-enhanced network with transformer for visual tracking

被引:7
|
作者
Gu, Fengwei [1 ,2 ]
Lu, Jun [1 ,2 ]
Cai, Chengtao [1 ,2 ]
机构
[1] Harbin Engn Univ, Coll Intelligent Syst Sci & Engn, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Key Lab Intelligent Technol & Applicat Marine Equi, Minist Educ, Harbin 150001, Peoples R China
基金
中国国家自然科学基金; 黑龙江省自然科学基金;
关键词
Visual tracking; Attention-enhanced network; Local feature information association module; Global feature information fusion module; Prediction network; OBJECT TRACKING; SIAMESE;
D O I
10.1007/s11042-023-15168-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, Siamese-based trackers have become particularly popular. The correlation module in these trackers is responsible for fusing the feature information from the template and the search region, to obtain the response results. However, there are very rich contextual information and feature dependencies among video sequences, and it is difficult for a simple correlation module to efficiently integrate useful information. Therefore, the tracker encounters the challenges of information loss and local optimal solutions. In this work, we propose a novel attention-enhanced network with a Transformer variant for robust visual tracking. The proposed method carefully designs the local feature information association module (LFIA) and the global feature information fusion module (GFIF) based on the attention mechanism, which can effectively utilize contextual information and feature dependencies to enhance feature information. Our approach transforms the visual tracking problem into a bounding box prediction problem, using only a simple prediction network for object localization, without any prior knowledge. Ultimately, we propose a robust tracker called RANformer. Experiments show that the proposed tracker achieves state-of-the-art performance on 7 popular tracking benchmarks while meeting real-time requirements with a speed exceeding 40FPS.
引用
收藏
页码:40761 / 40782
页数:22
相关论文
共 50 条
  • [1] A robust attention-enhanced network with transformer for visual tracking
    Fengwei Gu
    Jun Lu
    Chengtao Cai
    [J]. Multimedia Tools and Applications, 2023, 82 : 40761 - 40782
  • [2] APLNet: Attention-enhanced progressive learning network
    Zhang, Hui
    Kang, Danqing
    He, Haibo
    Wang, Fei-Yue
    [J]. NEUROCOMPUTING, 2020, 371 : 166 - 176
  • [3] AiATrack: Attention in Attention for Transformer Visual Tracking
    Gao, Shenyuan
    Zhou, Chunluan
    Ma, Chao
    Wang, Xinggang
    Yuan, Junsong
    [J]. COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 146 - 164
  • [4] CHANNEL ATTENTION BASED GENERATIVE NETWORK FOR ROBUST VISUAL TRACKING
    Hu, Ying
    Xuan, Hanyu
    Yang, Jian
    Yan, Yan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4082 - 4086
  • [5] Attention-enhanced neural network models for turbulence simulation
    Peng, Wenhui
    Yuan, Zelong
    Wang, Jianchun
    [J]. PHYSICS OF FLUIDS, 2022, 34 (02)
  • [6] Evota: an enhanced visual object tracking network with attention mechanism
    An Zhao
    Yi Zhang
    [J]. Multimedia Tools and Applications, 2024, 83 : 24939 - 24960
  • [7] Evota: an enhanced visual object tracking network with attention mechanism
    Zhao, An
    Zhang, Yi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 24939 - 24960
  • [8] FETrack: Feature-Enhanced Transformer Network for Visual Object Tracking
    Liu, Hang
    Huang, Detian
    Lin, Mingxin
    [J]. Applied Sciences (Switzerland), 2024, 14 (22):
  • [9] MTAtrack: Multilevel transformer attention for visual tracking
    An, Dong
    Zhang, Fan
    Zhao, Yuqian
    Luo, Biao
    Yang, Chunhua
    Chen, Baifan
    Yu, Lingli
    [J]. OPTICS AND LASER TECHNOLOGY, 2023, 166
  • [10] Attention-enhanced DeepRetiNet for robust hard exudates detection in diabetic retinopathy
    Chellaswamy, Pratheeba
    Rufus Kamalam, Calvin Jeba Rufus Nehemiah
    [J]. Biomedical Signal Processing and Control, 2025, 100