STMT: Spatio-temporal memory transformer for multi-object tracking

被引：2

作者：

Gu, Songbo ^{[1
]}

Ma, Jianxin ^{[1
]}

Hui, Guancheng ^{[1
]}

Xiao, Qiyang ^{[1
]}

Shi, Wentao ^{[2
]}

机构：

[1] Henan Univ, Sch Artificial Intelligence, Zhengzhou 450001, Henan, Peoples R China

[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Shanxi, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 20期

基金：

中国国家自然科学基金;

关键词：

Deep learning; Multi-object tracking; Transformer; Memory; Spatio-temporal;

D O I：

10.1007/s10489-023-04617-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Typically, modern online Multi-Object Tracking (MOT) methods first obtain the detected objects in each frame and then establish associations between them in successive frames. However, it is difficult to obtain high-quality trajectories when camera motion, fast motion, and occlusion challenges occur. To address these problems, this paper proposes a transformer-based MOT system named Spatio-Temporal Memory Transformer (STMT), which focuses on time and history information. The proposed STMT consists of a Spatio-Temporal Enhancement Module (STEM) that uses 3D convolution to model the spatial and temporal interactions of objects and obtains rich features in spatio-temporal information. Moreover, a Dynamic Spatio-Temporal Memory (DSTM) is presented to associate detections with tracklets and contains three units: an Identity Aggregation Module (IAM), a Linear Dynamic Encoder (LD-Encoder) and a memory Decoder (Decoder). The IAM utilizes the geometric changes of objects to reduce the impact of deformation on tracking performance, the LD-Encoder is used to obtain the dependency between objects, and the Decoder generates appearance similarity scores. Furthermore, a Score Fusion Equilibrium Strategy (SFES) is employed to balance the similarity and position distance fusion scores. Extensive experiments demonstrate that the proposed STMT approach is generally superior to the state-of-the-art trackers on the MOT16 and MOT17 benchmarks.

引用

页码：23426 / 23441

页数：16

共 50 条

[1] STMT: Spatio-temporal memory transformer for multi-object tracking
Songbo Gu
Jianxin Ma
Guancheng Hui
Qiyang Xiao
Wentao Shi
[J]. Applied Intelligence, 2023, 53 : 23426 - 23441
[2] Learning Spatio-Temporal Information for Multi-Object Tracking
Wei, Jian
Yang, Mei
Liu, Feng
[J]. IEEE ACCESS, 2017, 5 : 3869 - 3877
[3] STAT: Multi-Object Tracking Based on Spatio-Temporal Topological Constraints
Zhang, Junjie
Wang, Mingyan
Jiang, Haoran
Zhang, Xinyu
Yan, Chenggang
Zeng, Dan
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4445 - 4457
[4] Spatio-Temporal Correlation Graph for Association Enhancement in Multi-object Tracking
Zhong, Zhijie
Sheng, Hao
Zhang, Yang
Wu, Yubin
Chen, Jiahui
Ke, Wei
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 394 - 405
[5] Spatio-temporal object detection by deep learning: Video-interlacing to improve multi-object tracking
Mhalla, Ala
Chateau, Thierry
Ben Amara, Najoua Essoukri
[J]. IMAGE AND VISION COMPUTING, 2019, 88 : 120 - 131
[6] Spatio-temporal hierarchical feature transformer for UAV object tracking
Zhu, Fuzhen
Cui, Jingyi
Dou, Kaiqi
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 204 : 442 - 452
[7] Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes
Wang, Kai
Song, Xiangyu
Sun, Shijie
Zhao, Juan
Xu, Cai
Song, Huansheng
[J]. WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 186 - 200
[8] ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking
Sadjadpour, Tara
Li, Jie
Ambrus, Rares
Bohg, Jeannette
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05): : 4273 - 4280
[9] Exploring reliable infrared object tracking with spatio-temporal fusion transformer
Qi, Meibin
Wang, Qinxin
Zhuang, Shuo
Zhang, Ke
Li, Kunyuan
Liu, Yimin
Yang, Yanfang
[J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
[10] Learning Spatio-Temporal Transformer for Visual Tracking
Yan, Bin
Peng, Houwen
Fu, Jianlong
Wang, Dong
Lu, Huchuan
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437

← 1 2 3 4 5 →