Joint spatio-temporal modeling for visual tracking

被引:5
|
作者
Sun, Yumei [1 ,2 ,3 ,4 ,5 ]
Tang, Chuanming [1 ,2 ,3 ,4 ,5 ]
Luo, Hui [1 ,2 ,3 ,4 ,5 ]
Li, Qingqing [1 ,2 ,3 ,5 ]
Peng, Xiaoming [5 ]
Zhang, Jianlin [1 ,2 ,3 ,4 ,5 ]
Li, Meihui [1 ,2 ,3 ,5 ]
Wei, Yuxing [1 ,2 ,3 ,5 ]
机构
[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 108408, Peoples R China
[2] Chinese Acad Sci, Key Lab Opt Engn, Chengdu 610209, Peoples R China
[3] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China
[4] Chinese Acad Sci, Natl Key Lab Opt Field Manipulat Sci & Technol, Chengdu 610209, Peoples R China
[5] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China
关键词
Visual tracking; Siamese trackers; Sequence prediction; Spatio-temporal model;
D O I
10.1016/j.knosys.2023.111206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity-based approaches have made significant progress in visual object tracking (VOT). Although these methods work well in simple scenes, they ignore the continuous spatio-temporal connection of the object in the video sequence. For this reason, tracking by spatial matching solely can lead to tracking failures because of distractors and occlusion. In this paper, we propose a spatio-temporal joint-modeling tracker named STTrack which implicitly builds continuous connections between the temporal and spatial aspects of the sequence. Specifically, we first design a time-sequence iteration strategy (TSIS) to concentrate on the temporal connection of the object in the video sequence. Then, we propose a novel spatial temporal interaction Transformer network (STIN) to capture the spatio-temporal correlation of the object between frames. The proposed STIN module is robust in object occlusion because it explores the dynamic state change dependencies of the object. Finally, we introduce a spatio-temporal query to suppress distractors by iteratively propagating the target prior. Extensive experiments on six tracking benchmark datasets demonstrate that the proposed STTrack achieves excellent performance while operating in real-time. The code is publicly available at https://github.com/nubsym/STTrack.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Visual Tracking Using Spatio-temporal Context Template Set Learning
    Huang, Rixing
    Ren, Yi
    2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1496 - 1500
  • [32] Spatio-temporal interactive fusion based visual object tracking method
    Huang, Dandan
    Yu, Siyu
    Duan, Jin
    Wang, Yingzhi
    Yao, Anni
    Wang, Yiwen
    Xi, Junhan
    FRONTIERS IN PHYSICS, 2023, 11
  • [33] Visual Tracking With Spatio-Temporal Dempster-Shafer Information Fusion
    Li, Xi
    Dick, Anthony
    Shen, Chunhua
    Zhang, Zhongfei
    van den Hengel, Anton
    Wang, Hanzi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3028 - 3040
  • [34] Robust visual tracking via weighted spatio-temporal context learning
    Xu, Jian-Qiang
    Lu, Yao
    Zidonghua Xuebao/Acta Automatica Sinica, 2015, 41 (11): : 1901 - 1912
  • [35] Fast Visual Tracking via Dense Spatio-temporal Context Learning
    Zhang, Kaihua
    Zhang, Lei
    Liu, Qingshan
    Zhang, David
    Yang, Ming-Hsuan
    COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 127 - 141
  • [36] Robust Visual Tracking Using a Spatio-temporal Approach with Optical Flow
    Cheng, Chi-Cheng
    Ting, Shih-Hsiang
    2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2012, : 553 - 557
  • [37] Spatio-temporal shadow segmentation and tracking
    Salvador, E
    Cavallaro, A
    Rahimi, TE
    IMAGE AND VIDEO COMMUNICATIONS AND PROCESSING 2003, PTS 1 AND 2, 2003, 5022 : 389 - 400
  • [38] Joint Spatio-Temporal Alignment of Sequences
    Diego, Ferran
    Serrat, Joan
    Lopez, Antonio M.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (06) : 1377 - 1387
  • [39] A Spatio-Temporal Linked Data Representation for Modeling Spatio-Temporal Dialect Data
    Scholz, Johannes
    Hrastnig, Emanual
    Wandl-Vogt, Eveline
    PROCEEDINGS OF WORKSHOPS AND POSTERS AT THE 13TH INTERNATIONAL CONFERENCE ON SPATIAL INFORMATION THEORY (COSIT 2017), 2018, : 275 - 282
  • [40] SPATIO-TEMPORAL INTERACTION IN VISUAL RESOLUTION
    RASHBASS, C
    JOURNAL OF PHYSIOLOGY-LONDON, 1968, 196 (02): : P102 - &