Exploiting spatial and temporal context for online tracking with improved transformer

被引:0
|
作者
Zhang, Jianwei [1 ]
Wang, Jingchao [1 ]
Zhang, Huanlong [2 ]
Miao, Mengen [1 ]
Zhang, Jie [2 ]
Wu, Di [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 475000, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Zhengzhou 475000, Peoples R China
[3] Yellow River Engn Consulting Co Ltd, Zhengzhou 450003, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual tracking; Classification and regression network; Spatial and temporal context; Transformer; VISUAL TRACKING;
D O I
10.1016/j.imavis.2023.104672
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, the transformer is becoming more and more popular in computer vision tasks due to its ability to capture long-range dependencies via self-attention. In this paper, we propose a transformer-based classification regression network TrCAR utilizing the transformer to exploit deeper spatial and temporal context. Different from the classic architecture of the transformer, we introduce convolution operation into the transformer and change the calculation of features to make it suitable for the tracking task. After that, the improved transformer encoder is introduced into the regression branch of TrCAR and combined with the feature pyramid to complete multi-layer feature fusion, which is conducive to obtaining a high-quality target representation. To further enable the target model to adapt to the change of the target appearance, we bring the gradient descent to the regression branch so that it can be updated online to produce a more precise bounding box. Meanwhile, the new transformer is integrated into the classification branch of TrCAR, which as much as possible extracts the essential feature of the target across historical frames via the global computing capability, and uses it to emphasize the target position of the current frame via cross-attention. Which helps the classifier to more easily identify the correct target. Experimental results on OTB, LaSOT, VOT2018, NFS, GOT-10k, and TrackingNet benchmarks show that our TrCAR achieves comparable performance to the popular trackers.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Exploiting Temporal Influence in Online Recommendation
    Palovics, Robert
    Benczur, Andras A.
    Kocsis, Levente
    Kiss, Tamas
    Frigo, Erzsebet
    PROCEEDINGS OF THE 8TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'14), 2014, : 273 - 280
  • [32] Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video
    Zhao, Weichao
    Hu, Hezhen
    Zhou, Wengang
    Li, Li
    Li, Houqiang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
  • [33] Exploiting spatial relationships for visual tracking
    Chen, Yao
    Li, Lunbo
    Guo, Jianhui
    Yang, Chen
    Zhang, Haofeng
    PATTERN RECOGNITION LETTERS, 2023, 175 : 16 - 22
  • [34] Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
    Wen, Longyin
    Cai, Zhaowei
    Lei, Zhen
    Yi, Dong
    Li, Stan Z.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (02) : 785 - 796
  • [35] An improved spatial-temporal regularization method for visual object tracking
    Hayat, Muhammad Umar
    Ali, Ahmad
    Khan, Baber
    Mehmood, Khizer
    Ullah, Khitab
    Amir, Muhammad
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) : 2065 - 2077
  • [36] Improved Side Information Generation for Distributed Video Coding by Exploiting Spatial and Temporal Correlations
    Shuiming Ye
    Mourad Ouaret
    Frederic Dufaux
    Touradj Ebrahimi
    EURASIP Journal on Image and Video Processing, 2009
  • [37] Improved Side Information Generation for Distributed Video Coding by Exploiting Spatial and Temporal Correlations
    Ye, Shuiming
    Ouaret, Mourad
    Dufaux, Frederic
    Ebrahimi, Touradj
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2009,
  • [38] Long-term correlation tracking via spatial-temporal context
    Chen, Zhi
    Liu, Peizhong
    Du, Yongzhao
    Luo, Yanmin
    Guo, Jing-Ming
    VISUAL COMPUTER, 2020, 36 (02): : 425 - 442
  • [39] Dynamic feature fusion with spatial-temporal context for robust object tracking
    Nai, Ke
    Li, Zhiyong
    Wang, Haidong
    PATTERN RECOGNITION, 2022, 130
  • [40] Object tracking based on adaptive updating of a spatial-temporal context model
    Feng, Wanli
    Cen, Yigang
    Zeng, Xianyou
    Li, Zhetao
    Zeng, Ming
    Voronin, Viacheslav
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2017, 11 (11): : 5459 - 5473