Exploiting spatial and temporal context for online tracking with improved transformer

被引：0

作者：

Zhang, Jianwei ^{[1
]}

Wang, Jingchao ^{[1
]}

Zhang, Huanlong ^{[2
]}

Miao, Mengen ^{[1
]}

Zhang, Jie ^{[2
]}

Wu, Di ^{[3
]}

机构：

[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 475000, Peoples R China

[2] Zhengzhou Univ Light Ind, Coll Elect & Informat Engn, Zhengzhou 475000, Peoples R China

[3] Yellow River Engn Consulting Co Ltd, Zhengzhou 450003, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 133卷

基金：

中国国家自然科学基金;

关键词：

Visual tracking; Classification and regression network; Spatial and temporal context; Transformer; VISUAL TRACKING;

D O I：

10.1016/j.imavis.2023.104672

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

At present, the transformer is becoming more and more popular in computer vision tasks due to its ability to capture long-range dependencies via self-attention. In this paper, we propose a transformer-based classification regression network TrCAR utilizing the transformer to exploit deeper spatial and temporal context. Different from the classic architecture of the transformer, we introduce convolution operation into the transformer and change the calculation of features to make it suitable for the tracking task. After that, the improved transformer encoder is introduced into the regression branch of TrCAR and combined with the feature pyramid to complete multi-layer feature fusion, which is conducive to obtaining a high-quality target representation. To further enable the target model to adapt to the change of the target appearance, we bring the gradient descent to the regression branch so that it can be updated online to produce a more precise bounding box. Meanwhile, the new transformer is integrated into the classification branch of TrCAR, which as much as possible extracts the essential feature of the target across historical frames via the global computing capability, and uses it to emphasize the target position of the current frame via cross-attention. Which helps the classifier to more easily identify the correct target. Experimental results on OTB, LaSOT, VOT2018, NFS, GOT-10k, and TrackingNet benchmarks show that our TrCAR achieves comparable performance to the popular trackers.

引用

页数：11

共 50 条

[1] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Wang, Ning
Zhou, Wengang
Wang, Jie
Li, Houqiang
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1571 - 1580
[2] Online Multiplayer Tracking by Extracting Temporal Contexts with Transformer
Han, Xiao
Wang, Yongbin
Liu, Shouxun
Jin, Cong
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[3] Online Multiplayer Tracking by Extracting Temporal Contexts with Transformer
Han, Xiao
Wang, Yongbin
Liu, Shouxun
Jin, Cong
Wireless Communications and Mobile Computing, 2022, 2022
[4] Enhancing Online UAV Multi-Object Tracking with Temporal Context and Spatial Topological Relationships
Xiao, Changcheng
Cao, Qiong
Zhong, Yujie
Lan, Long
Zhang, Xiang
Cai, Huayue
Luo, Zhigang
DRONES, 2023, 7 (06)
[5] An Improved Object Tracking Based on Spatial Context
Xu, Bo
Wang, Zhenhai
2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1035 - 1039
[6] Spatial-Temporal Context-Aware Tracking
Han, Yuqi
Deng, Chenwei
Zhao, Boya
Zhao, Baojun
IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (03) : 500 - 504
[7] Object tracking based on temporal and spatial context information
Chen, Yan
Lin, Tao
Du, Jixiang
Zhang, Hongbo
IMAGE AND VISION COMPUTING, 2025, 157
[8] An Improved Spatio-temporal Context Tracking Algorithm
Wan, Hao
Li, Weiguang
Ye, Guoqiang
PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 1320 - 1325
[9] Exploiting temporal coherence for self-supervised visual tracking by using vision transformer
Zhu, Wenjun
Wang, Zuyi
Xu, Li
Meng, Jun
KNOWLEDGE-BASED SYSTEMS, 2022, 251
[10] TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
Chu, Peng
Wang, Jiang
You, Quanzeng
Ling, Haibin
Liu, Zicheng
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4859 - 4869

← 1 2 3 4 5 →