Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers

被引:21
|
作者
Zhu, Tianyu [1 ]
Hiller, Markus [2 ]
Ehsanpour, Mahsa [3 ]
Ma, Rongkai [1 ]
Drummond, Tom [2 ]
Reid, Ian
Rezatofighi, Hamid [4 ]
机构
[1] Monash Univ, Dept Elect & Comp Syst Engn, Clayton, Vic 3800, Australia
[2] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3010, Australia
[3] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA 5005, Australia
[4] Monash Univ, Dept Data Sci & AI, Clayton, Vic 3800, Australia
关键词
Tracking; Transformers; Task analysis; Visualization; Object recognition; History; Feature extraction; Multi-object tracking; transformer; spatio-temporal model; pedestrian tracking; end-to-end learning; MULTITARGET;
D O I
10.1109/TPAMI.2022.3213073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tracking a time-varying indefinite number of objects in a video sequence over time remains a challenge despite recent advances in the field. Most existing approaches are not able to properly handle multi-object tracking challenges such as occlusion, in part because they ignore long-term temporal information. To address these shortcomings, we present MO3TR: a truly end-to-end Transformer-based online multi-object tracking (MOT) framework that learns to handle occlusions, track initiation and termination without the need for an explicit data association module or any heuristics. MO3TR encodes object interactions into long-term temporal embeddings using a combination of spatial and temporal Transformers, and recursively uses the information jointly with the input data to estimate the states of all tracked objects over time. The spatial attention mechanism enables our framework to learn implicit representations between all the objects and the objects to the measurements, while the temporal attention mechanism focuses on specific parts of past information, allowing our approach to resolve occlusions over multiple frames. Our experiments demonstrate the potential of this new approach, achieving results on par with or better than the current state-of-the-art on multiple MOT metrics for several popular multi-object tracking benchmarks.
引用
收藏
页码:12783 / 12797
页数:15
相关论文
共 50 条
  • [1] Tracking Beyond Detection: Learning a Global Response Map for End-to-End Multi-Object Tracking
    Wan, Xingyu
    Cao, Jiakai
    Zhou, Sanping
    Wang, Jinjun
    Zheng, Nanning
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8222 - 8235
  • [2] Joint Detection and Association for End-to-End Multi-object Tracking
    Li, Ye
    Luo, Xiaoyu
    Shi, Junyu
    Wang, Xinzhong
    Yin, Guangqiang
    Wang, Zhiguo
    NEURAL PROCESSING LETTERS, 2023, 55 (09) : 11823 - 11844
  • [3] Joint Detection and Association for End-to-End Multi-object Tracking
    Ye Li
    Xiaoyu Luo
    Junyu Shi
    Xinzhong Wang
    Guangqiang Yin
    Zhiguo Wang
    Neural Processing Letters, 2023, 55 : 11823 - 11844
  • [4] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [5] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
    Zhou, Qianyu
    Li, Xiangtai
    He, Lu
    Yang, Yibo
    Cheng, Guangliang
    Tong, Yunhai
    Ma, Lizhuang
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
  • [6] End-to-End Joint Multi-Object Detection and Tracking for Intelligent Transportation Systems
    Xu, Qing
    Lin, Xuewu
    Cai, Mengchi
    Guo, Yu-ang
    Zhang, Chuang
    Li, Kai
    Li, Keqiang
    Wang, Jianqiang
    Cao, Dongpu
    CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2023, 36 (01)
  • [7] End-to-End Joint Multi-Object Detection and Tracking for Intelligent Transportation Systems
    Qing Xu
    Xuewu Lin
    Mengchi Cai
    Yu-ang Guo
    Chuang Zhang
    Kai Li
    Keqiang Li
    Jianqiang Wang
    Dongpu Cao
    Chinese Journal of Mechanical Engineering, 36
  • [8] End-to-End Joint Multi-Object Detection and Tracking for Intelligent Transportation Systems
    Qing Xu
    Xuewu Lin
    Mengchi Cai
    Yu-ang Guo
    Chuang Zhang
    Kai Li
    Keqiang Li
    Jianqiang Wang
    Dongpu Cao
    Chinese Journal of Mechanical Engineering, 2023, (05) : 295 - 305
  • [9] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
    Zhang, Yuang
    Wang, Tiancai
    Zhang, Xiangyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22056 - 22065
  • [10] End-to-End Chained Pedestrian Multi-Object Tracking Based on Multi-Feature Fusion
    Zhou, Haiyun
    Xiang, Xuezhi
    Wang, Xinyao
    Ren, Wenkai
    PROCEEDINGS OF 2021 IEEE 12TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2021, : 150 - 153