Joint spatio-temporal modeling for visual tracking

被引:5
|
作者
Sun, Yumei [1 ,2 ,3 ,4 ,5 ]
Tang, Chuanming [1 ,2 ,3 ,4 ,5 ]
Luo, Hui [1 ,2 ,3 ,4 ,5 ]
Li, Qingqing [1 ,2 ,3 ,5 ]
Peng, Xiaoming [5 ]
Zhang, Jianlin [1 ,2 ,3 ,4 ,5 ]
Li, Meihui [1 ,2 ,3 ,5 ]
Wei, Yuxing [1 ,2 ,3 ,5 ]
机构
[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 108408, Peoples R China
[2] Chinese Acad Sci, Key Lab Opt Engn, Chengdu 610209, Peoples R China
[3] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China
[4] Chinese Acad Sci, Natl Key Lab Opt Field Manipulat Sci & Technol, Chengdu 610209, Peoples R China
[5] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China
关键词
Visual tracking; Siamese trackers; Sequence prediction; Spatio-temporal model;
D O I
10.1016/j.knosys.2023.111206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity-based approaches have made significant progress in visual object tracking (VOT). Although these methods work well in simple scenes, they ignore the continuous spatio-temporal connection of the object in the video sequence. For this reason, tracking by spatial matching solely can lead to tracking failures because of distractors and occlusion. In this paper, we propose a spatio-temporal joint-modeling tracker named STTrack which implicitly builds continuous connections between the temporal and spatial aspects of the sequence. Specifically, we first design a time-sequence iteration strategy (TSIS) to concentrate on the temporal connection of the object in the video sequence. Then, we propose a novel spatial temporal interaction Transformer network (STIN) to capture the spatio-temporal correlation of the object between frames. The proposed STIN module is robust in object occlusion because it explores the dynamic state change dependencies of the object. Finally, we introduce a spatio-temporal query to suppress distractors by iteratively propagating the target prior. Extensive experiments on six tracking benchmark datasets demonstrate that the proposed STTrack achieves excellent performance while operating in real-time. The code is publicly available at https://github.com/nubsym/STTrack.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Adaptive Spatio-Temporal Context Learning for Visual Target Tracking
    Marvasti-Zadeh, Seyed Mojtaba
    Ghanei-Yakhdan, Hossein
    Kasaei, Shohreh
    2017 10TH IRANIAN CONFERENCE ON MACHINE VISION AND IMAGE PROCESSING (MVIP), 2017, : 10 - 14
  • [22] Memory Prompt for Spatio-Temporal Transformer Visual Object Tracking
    Xu T.
    Wu X.
    Zhu X.
    Kittler J.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 6
  • [23] Robust Visual Tracking with Dual Spatio-Temporal Context Trackers
    Sun, Shiyan
    Zhang, Hong
    Yuan, Ding
    SEVENTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2015), 2015, 9817
  • [24] Tracking and Modeling of Spatio-Temporal Fields with a Mobile Sensor Network
    Lu, Bowen
    Gu, Dongbing
    Hu, Huosheng
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 2711 - 2716
  • [25] Airborne target tracking based on spatio-temporal saliency modeling
    Zhang W.
    Zhong S.
    Wang J.
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2020, 41 (03):
  • [26] Modeling and tracking of spatio-temporal scalar fields with multiple robots
    Yan, Chuanbo
    Zhang, Tao
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 8548 - 8552
  • [27] Aberrance suppressed spatio-temporal correlation filters for visual object tracking
    Elayaperumal, Dinesh
    Joo, Young Hoon
    PATTERN RECOGNITION, 2021, 115
  • [28] Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
    Wen, Longyin
    Cai, Zhaowei
    Lei, Zhen
    Yi, Dong
    Li, Stan Z.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (02) : 785 - 796
  • [29] Learning spatio-temporal context via hierarchical features for visual tracking
    Cao, Yi
    Ji, Hongbing
    Zhang, Wenbo
    Xue, Fei
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 66 : 50 - 65
  • [30] DASTSiam: Spatio-temporal fusion and discriminative enhancement for Siamese visual tracking
    Huang, Yucheng
    Firkat, Eksan
    Zhang, Jinlai
    Zhu, Lijuan
    Zhu, Bin
    Zhu, Jihong
    Hamdulla, Askar
    IET COMPUTER VISION, 2023, 17 (08) : 1017 - 1033