STMT: Spatio-temporal memory transformer for multi-object tracking

被引:2
|
作者
Gu, Songbo [1 ]
Ma, Jianxin [1 ]
Hui, Guancheng [1 ]
Xiao, Qiyang [1 ]
Shi, Wentao [2 ]
机构
[1] Henan Univ, Sch Artificial Intelligence, Zhengzhou 450001, Henan, Peoples R China
[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Multi-object tracking; Transformer; Memory; Spatio-temporal;
D O I
10.1007/s10489-023-04617-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Typically, modern online Multi-Object Tracking (MOT) methods first obtain the detected objects in each frame and then establish associations between them in successive frames. However, it is difficult to obtain high-quality trajectories when camera motion, fast motion, and occlusion challenges occur. To address these problems, this paper proposes a transformer-based MOT system named Spatio-Temporal Memory Transformer (STMT), which focuses on time and history information. The proposed STMT consists of a Spatio-Temporal Enhancement Module (STEM) that uses 3D convolution to model the spatial and temporal interactions of objects and obtains rich features in spatio-temporal information. Moreover, a Dynamic Spatio-Temporal Memory (DSTM) is presented to associate detections with tracklets and contains three units: an Identity Aggregation Module (IAM), a Linear Dynamic Encoder (LD-Encoder) and a memory Decoder (Decoder). The IAM utilizes the geometric changes of objects to reduce the impact of deformation on tracking performance, the LD-Encoder is used to obtain the dependency between objects, and the Decoder generates appearance similarity scores. Furthermore, a Score Fusion Equilibrium Strategy (SFES) is employed to balance the similarity and position distance fusion scores. Extensive experiments demonstrate that the proposed STMT approach is generally superior to the state-of-the-art trackers on the MOT16 and MOT17 benchmarks.
引用
收藏
页码:23426 / 23441
页数:16
相关论文
共 50 条
  • [1] STMT: Spatio-temporal memory transformer for multi-object tracking
    Songbo Gu
    Jianxin Ma
    Guancheng Hui
    Qiyang Xiao
    Wentao Shi
    [J]. Applied Intelligence, 2023, 53 : 23426 - 23441
  • [2] Learning Spatio-Temporal Information for Multi-Object Tracking
    Wei, Jian
    Yang, Mei
    Liu, Feng
    [J]. IEEE ACCESS, 2017, 5 : 3869 - 3877
  • [3] STAT: Multi-Object Tracking Based on Spatio-Temporal Topological Constraints
    Zhang, Junjie
    Wang, Mingyan
    Jiang, Haoran
    Zhang, Xinyu
    Yan, Chenggang
    Zeng, Dan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4445 - 4457
  • [4] Spatio-Temporal Correlation Graph for Association Enhancement in Multi-object Tracking
    Zhong, Zhijie
    Sheng, Hao
    Zhang, Yang
    Wu, Yubin
    Chen, Jiahui
    Ke, Wei
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 394 - 405
  • [5] Spatio-temporal object detection by deep learning: Video-interlacing to improve multi-object tracking
    Mhalla, Ala
    Chateau, Thierry
    Ben Amara, Najoua Essoukri
    [J]. IMAGE AND VISION COMPUTING, 2019, 88 : 120 - 131
  • [6] Spatio-temporal hierarchical feature transformer for UAV object tracking
    Zhu, Fuzhen
    Cui, Jingyi
    Dou, Kaiqi
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 204 : 442 - 452
  • [7] Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes
    Wang, Kai
    Song, Xiangyu
    Sun, Shijie
    Zhao, Juan
    Xu, Cai
    Song, Huansheng
    [J]. WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 186 - 200
  • [8] ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking
    Sadjadpour, Tara
    Li, Jie
    Ambrus, Rares
    Bohg, Jeannette
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05): : 4273 - 4280
  • [9] Exploring reliable infrared object tracking with spatio-temporal fusion transformer
    Qi, Meibin
    Wang, Qinxin
    Zhuang, Shuo
    Zhang, Ke
    Li, Kunyuan
    Liu, Yimin
    Yang, Yanfang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [10] Learning Spatio-Temporal Transformer for Visual Tracking
    Yan, Bin
    Peng, Houwen
    Fu, Jianlong
    Wang, Dong
    Lu, Huchuan
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437