Unified spatio-temporal attention mixformer for visual object tracking

被引:0
|
作者
Park, Minho [1 ]
Yoon, Gang-Joon [2 ]
Song, Jinjoo [1 ]
Yoon, Sang Min [1 ]
机构
[1] Kookmin Univ, Coll Comp Sci, HCI Lab, 77 Jeongneung Ro, Seoul 02707, South Korea
[2] Natl Inst Math Sci, 70 Yuseong Daero 1689 Beon Gil, Daejeon 34047, South Korea
关键词
Visual object tracking; Unified vision transformer; Spatio-temporal model; FILTER;
D O I
10.1016/j.engappai.2024.108682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a unified spatio-temporal attention MixFormer framework for visual object tracking. Within the vision transformer framework, we design a cohesive network consisting of target template and search region feature extraction, cross -attention utilizing spatial and temporal information, and task -specific heads, all operating in an end -to -end manner. Incorporating spatial and temporal attention modules within the network enables simultaneous feature extraction and emphasis, allowing the model to concentrate on targetspecific discriminative features despite changes in illumination, occlusion, scale, camera pose, and background clutter. Stacking multiple non-hierarchical blocks allows meaningful features to be extracted while irrelevant features are discarded from the provided target template and search region. The simultaneous spatio-temporal attention module is employed to accentuate target appearance features and alleviate variation in the object state across frame sequences. Qualitative and quantitative analysis, including ablation tests based on various tracking benchmarks, validates the robustness of the proposed tracking methodology.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] UAV Visual Object Tracking Based on Spatio-Temporal Context
    He, Yongxiang
    Chao, Chuang
    Zhang, Zhao
    Guo, Hongwu
    Ma, Jianjun
    [J]. Drones, 2024, 8 (12)
  • [2] Spatio-temporal graph mixformer for traffic forecasting
    Lablack, Mourad
    Shen, Yanming
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 228
  • [3] Aberrance suppressed spatio-temporal correlation filters for visual object tracking
    Elayaperumal, Dinesh
    Joo, Young Hoon
    [J]. PATTERN RECOGNITION, 2021, 115
  • [4] Spatio-temporal interactive fusion based visual object tracking method
    Huang, Dandan
    Yu, Siyu
    Duan, Jin
    Wang, Yingzhi
    Yao, Anni
    Wang, Yiwen
    Xi, Junhan
    [J]. FRONTIERS IN PHYSICS, 2023, 11
  • [5] A unified spatio-temporal articulated model for tracking
    Lan, XY
    Huttenlocher, DP
    [J]. PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, 2004, : 722 - 729
  • [6] Spatio-temporal Active Learning for Visual Tracking
    Liu, Chenfeng
    Zhu, Pengfei
    Hu, Qinghua
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [7] Learning Spatio-Temporal Transformer for Visual Tracking
    Yan, Bin
    Peng, Houwen
    Fu, Jianlong
    Wang, Dong
    Lu, Huchuan
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437
  • [8] Joint spatio-temporal modeling for visual tracking
    Sun, Yumei
    Tang, Chuanming
    Luo, Hui
    Li, Qingqing
    Peng, Xiaoming
    Zhang, Jianlin
    Li, Meihui
    Wei, Yuxing
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [9] Spatio-temporal matching for siamese visual tracking
    Zhang, Jinpu
    Dai, Kaiheng
    Li, Ziwen
    Wei, Ruonan
    Wang, Yuehuan
    [J]. NEUROCOMPUTING, 2023, 522 : 73 - 88
  • [10] Memory Network With Pixel-Level Spatio-Temporal Learning for Visual Object Tracking
    Zhou, Zechu
    Zhou, Xinyu
    Chen, Zhaoyu
    Guo, Pinxue
    Liu, Qian-Yu
    Zhang, Wenqiang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6897 - 6911