Joint spatio-temporal modeling for visual tracking

被引：5

作者：

Sun, Yumei ^{[1
,2
,3
,4
,5
]}

Tang, Chuanming ^{[1
,2
,3
,4
,5
]}

Luo, Hui ^{[1
,2
,3
,4
,5
]}

Li, Qingqing ^{[1
,2
,3
,5
]}

Peng, Xiaoming ^{[5
]}

Zhang, Jianlin ^{[1
,2
,3
,4
,5
]}

Li, Meihui ^{[1
,2
,3
,5
]}

Wei, Yuxing ^{[1
,2
,3
,5
]}

机构：

[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 108408, Peoples R China

[2] Chinese Acad Sci, Key Lab Opt Engn, Chengdu 610209, Peoples R China

[3] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China

[4] Chinese Acad Sci, Natl Key Lab Opt Field Manipulat Sci & Technol, Chengdu 610209, Peoples R China

[5] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 283卷

关键词：

Visual tracking; Siamese trackers; Sequence prediction; Spatio-temporal model;

D O I：

10.1016/j.knosys.2023.111206

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Similarity-based approaches have made significant progress in visual object tracking (VOT). Although these methods work well in simple scenes, they ignore the continuous spatio-temporal connection of the object in the video sequence. For this reason, tracking by spatial matching solely can lead to tracking failures because of distractors and occlusion. In this paper, we propose a spatio-temporal joint-modeling tracker named STTrack which implicitly builds continuous connections between the temporal and spatial aspects of the sequence. Specifically, we first design a time-sequence iteration strategy (TSIS) to concentrate on the temporal connection of the object in the video sequence. Then, we propose a novel spatial temporal interaction Transformer network (STIN) to capture the spatio-temporal correlation of the object between frames. The proposed STIN module is robust in object occlusion because it explores the dynamic state change dependencies of the object. Finally, we introduce a spatio-temporal query to suppress distractors by iteratively propagating the target prior. Extensive experiments on six tracking benchmark datasets demonstrate that the proposed STTrack achieves excellent performance while operating in real-time. The code is publicly available at https://github.com/nubsym/STTrack.

引用

页数：10

共 50 条

[31] Visual Tracking Using Spatio-temporal Context Template Set Learning
Huang, Rixing
Ren, Yi
2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1496 - 1500
[32] Spatio-temporal interactive fusion based visual object tracking method
Huang, Dandan
Yu, Siyu
Duan, Jin
Wang, Yingzhi
Yao, Anni
Wang, Yiwen
Xi, Junhan
FRONTIERS IN PHYSICS, 2023, 11
[33] Visual Tracking With Spatio-Temporal Dempster-Shafer Information Fusion
Li, Xi
Dick, Anthony
Shen, Chunhua
Zhang, Zhongfei
van den Hengel, Anton
Wang, Hanzi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3028 - 3040
[34] Robust visual tracking via weighted spatio-temporal context learning
Xu, Jian-Qiang
Lu, Yao
Zidonghua Xuebao/Acta Automatica Sinica, 2015, 41 (11): : 1901 - 1912
[35] Fast Visual Tracking via Dense Spatio-temporal Context Learning
Zhang, Kaihua
Zhang, Lei
Liu, Qingshan
Zhang, David
Yang, Ming-Hsuan
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 127 - 141
[36] Robust Visual Tracking Using a Spatio-temporal Approach with Optical Flow
Cheng, Chi-Cheng
Ting, Shih-Hsiang
2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2012, : 553 - 557
[37] Spatio-temporal shadow segmentation and tracking
Salvador, E
Cavallaro, A
Rahimi, TE
IMAGE AND VIDEO COMMUNICATIONS AND PROCESSING 2003, PTS 1 AND 2, 2003, 5022 : 389 - 400
[38] Joint Spatio-Temporal Alignment of Sequences
Diego, Ferran
Serrat, Joan
Lopez, Antonio M.
IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (06) : 1377 - 1387
[39] A Spatio-Temporal Linked Data Representation for Modeling Spatio-Temporal Dialect Data
Scholz, Johannes
Hrastnig, Emanual
Wandl-Vogt, Eveline
PROCEEDINGS OF WORKSHOPS AND POSTERS AT THE 13TH INTERNATIONAL CONFERENCE ON SPATIAL INFORMATION THEORY (COSIT 2017), 2018, : 275 - 282
[40] SPATIO-TEMPORAL INTERACTION IN VISUAL RESOLUTION
RASHBASS, C
JOURNAL OF PHYSIOLOGY-LONDON, 1968, 196 (02): : P102 - &

← 1 2 3 4 5 →