Exploiting spatial and temporal context for online tracking with improved transformer

被引:0
|
作者
Zhang, Jianwei [1 ]
Wang, Jingchao [1 ]
Zhang, Huanlong [2 ]
Miao, Mengen [1 ]
Zhang, Jie [2 ]
Wu, Di [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 475000, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Zhengzhou 475000, Peoples R China
[3] Yellow River Engn Consulting Co Ltd, Zhengzhou 450003, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual tracking; Classification and regression network; Spatial and temporal context; Transformer; VISUAL TRACKING;
D O I
10.1016/j.imavis.2023.104672
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, the transformer is becoming more and more popular in computer vision tasks due to its ability to capture long-range dependencies via self-attention. In this paper, we propose a transformer-based classification regression network TrCAR utilizing the transformer to exploit deeper spatial and temporal context. Different from the classic architecture of the transformer, we introduce convolution operation into the transformer and change the calculation of features to make it suitable for the tracking task. After that, the improved transformer encoder is introduced into the regression branch of TrCAR and combined with the feature pyramid to complete multi-layer feature fusion, which is conducive to obtaining a high-quality target representation. To further enable the target model to adapt to the change of the target appearance, we bring the gradient descent to the regression branch so that it can be updated online to produce a more precise bounding box. Meanwhile, the new transformer is integrated into the classification branch of TrCAR, which as much as possible extracts the essential feature of the target across historical frames via the global computing capability, and uses it to emphasize the target position of the current frame via cross-attention. Which helps the classifier to more easily identify the correct target. Experimental results on OTB, LaSOT, VOT2018, NFS, GOT-10k, and TrackingNet benchmarks show that our TrCAR achieves comparable performance to the popular trackers.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] An object tracking algorithm based on optical flow and temporal-spatial context
    Ma, Yongliang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S5739 - S5747
  • [42] Spatial-Temporal Context-Aware Online Action Detection and Prediction
    Huang, Jingjia
    Li, Nannan
    Li, Thomas
    Liu, Shan
    Li, Ge
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (08) : 2650 - 2662
  • [43] Exploiting Spatial-Temporal-Social Constraints for Localness Inference Using Online Social Media
    Huang, Chao
    Wang, Dong
    PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, 2016, : 287 - 294
  • [44] Exploiting Temporal Context for Tiny Object Detection
    Corsel, Christof W.
    van Lier, Michel
    Kampmeijer, Leo
    Boehrer, Nicolas
    Bakker, Erwin M.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 79 - 89
  • [45] Online Improved Eigen Tracking
    Tripathi, Subarna
    Chaudhury, Santanu
    Roy, Sumantra Dutta
    ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 278 - 281
  • [46] Online Error Evaluation of Capacitive Voltage Transformer Based on Improved Temporal Convolutional Network Autoencoder
    Zhou, Kun
    Wei, Hongtao
    Wang, Wenshuo
    2024 10TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS, EECR 2024, 2024, : 246 - 250
  • [47] Temporal/Spatial tracking requirements for tracking humans
    Robinson, Aaron L.
    Miller, Brian
    Moyer, Steve
    Ra, Chun
    INFRARED IMAGING SYSTEMS: DESIGN, ANALYSIS, MODELING, AND TESTING XVIII, 2007, 6543
  • [48] An Improved Kernelized-Correlation-Filter Spatial Target Tracking Method using Variable Regularization and Spatio-Temporal Context Model
    Mao, Yuxuan
    Yang, Zhijia
    Liu, Xiaozheng
    Zhang, Tinghua
    Gao, Kun
    2019 INTERNATIONAL CONFERENCE ON OPTICAL INSTRUMENTS AND TECHNOLOGY: OPTOELECTRONIC IMAGING/SPECTROSCOPY AND SIGNAL PROCESSING TECHNOLOGY, 2020, 11438
  • [49] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
    Dao, Minh-Son
    Zetsu, Koji
    Hoang, Duy-Tang
    Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281