Exploiting spatial and temporal context for online tracking with improved transformer

被引:0
|
作者
Zhang, Jianwei [1 ]
Wang, Jingchao [1 ]
Zhang, Huanlong [2 ]
Miao, Mengen [1 ]
Zhang, Jie [2 ]
Wu, Di [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 475000, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Zhengzhou 475000, Peoples R China
[3] Yellow River Engn Consulting Co Ltd, Zhengzhou 450003, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual tracking; Classification and regression network; Spatial and temporal context; Transformer; VISUAL TRACKING;
D O I
10.1016/j.imavis.2023.104672
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, the transformer is becoming more and more popular in computer vision tasks due to its ability to capture long-range dependencies via self-attention. In this paper, we propose a transformer-based classification regression network TrCAR utilizing the transformer to exploit deeper spatial and temporal context. Different from the classic architecture of the transformer, we introduce convolution operation into the transformer and change the calculation of features to make it suitable for the tracking task. After that, the improved transformer encoder is introduced into the regression branch of TrCAR and combined with the feature pyramid to complete multi-layer feature fusion, which is conducive to obtaining a high-quality target representation. To further enable the target model to adapt to the change of the target appearance, we bring the gradient descent to the regression branch so that it can be updated online to produce a more precise bounding box. Meanwhile, the new transformer is integrated into the classification branch of TrCAR, which as much as possible extracts the essential feature of the target across historical frames via the global computing capability, and uses it to emphasize the target position of the current frame via cross-attention. Which helps the classifier to more easily identify the correct target. Experimental results on OTB, LaSOT, VOT2018, NFS, GOT-10k, and TrackingNet benchmarks show that our TrCAR achieves comparable performance to the popular trackers.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
    Wang, Ning
    Zhou, Wengang
    Wang, Jie
    Li, Houqiang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1571 - 1580
  • [2] Online Multiplayer Tracking by Extracting Temporal Contexts with Transformer
    Han, Xiao
    Wang, Yongbin
    Liu, Shouxun
    Jin, Cong
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [3] Online Multiplayer Tracking by Extracting Temporal Contexts with Transformer
    Han, Xiao
    Wang, Yongbin
    Liu, Shouxun
    Jin, Cong
    Wireless Communications and Mobile Computing, 2022, 2022
  • [4] Enhancing Online UAV Multi-Object Tracking with Temporal Context and Spatial Topological Relationships
    Xiao, Changcheng
    Cao, Qiong
    Zhong, Yujie
    Lan, Long
    Zhang, Xiang
    Cai, Huayue
    Luo, Zhigang
    DRONES, 2023, 7 (06)
  • [5] An Improved Object Tracking Based on Spatial Context
    Xu, Bo
    Wang, Zhenhai
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1035 - 1039
  • [6] Spatial-Temporal Context-Aware Tracking
    Han, Yuqi
    Deng, Chenwei
    Zhao, Boya
    Zhao, Baojun
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (03) : 500 - 504
  • [7] Object tracking based on temporal and spatial context information
    Chen, Yan
    Lin, Tao
    Du, Jixiang
    Zhang, Hongbo
    IMAGE AND VISION COMPUTING, 2025, 157
  • [8] An Improved Spatio-temporal Context Tracking Algorithm
    Wan, Hao
    Li, Weiguang
    Ye, Guoqiang
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 1320 - 1325
  • [9] Exploiting temporal coherence for self-supervised visual tracking by using vision transformer
    Zhu, Wenjun
    Wang, Zuyi
    Xu, Li
    Meng, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [10] TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
    Chu, Peng
    Wang, Jiang
    You, Quanzeng
    Ling, Haibin
    Liu, Zicheng
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4859 - 4869