TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking

被引:45
|
作者
Reddy, N. Dinesh [1 ]
Guigues, Laurent [2 ]
Pishchulin, Leonid [2 ]
Eledath, Jayan [2 ]
Narasimhan, Srinivasa G. [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Amazon, Seattle, WA USA
关键词
D O I
10.1109/CVPR46437.2021.01494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the task of 3D pose estimation and tracking of multiple people seen in an arbitrary number of camera feeds. We propose TesseTrack1, a novel top-down approach that simultaneously reasons about multiple individuals' 3D body joint reconstructions and associations in space and time in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that operates in a common voxelized feature space aggregated from single- or multiple camera views. After a person detection step, a 4D CNN produces short-term person-specific representations which are then linked across time by a differentiable matcher. The linked descriptions are then merged and deconvolved into 3D poses. This joint spatio-temporal formulation contrasts with previous piecewise strategies that treat 2D pose estimation, 2D-to-3D lifting, and 3D pose tracking as independent sub-problems that are error-prone when solved in isolation. Furthermore, unlike previous methods, TesseTrack is robust to changes in the number of camera views and achieves very good results even if a single view is available at inference time. Quantitative evaluation of 3D pose reconstruction accuracy on standard benchmarks shows significant improvements over the state of the art. Evaluation of multi-person articulated 3D pose tracking in our novel evaluation framework demonstrates the superiority of TesseTrack over strong baselines.
引用
收藏
页码:15185 / 15195
页数:11
相关论文
共 50 条
  • [1] End-to-End Multi-Person Pose Estimation with Transformers
    Shi, Dahu
    Wei, Xing
    Li, Liangqi
    Ren, Ye
    Tan, Wenming
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
  • [2] PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
    Qiu, Zhongwei
    Yang, Qiansheng
    Wang, Jian
    Feng, Haocheng
    Han, Junyu
    Ding, Errui
    Xu, Chang
    Fu, Dongmei
    Wang, Jingdong
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21254 - 21263
  • [3] Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation
    Liu, Huan
    Chen, Qiang
    Tan, Zichang
    Liu, Jiang-Jiang
    Wang, Jian
    Su, Xiangbo
    Li, Xiaolong
    Yao, Kun
    Han, Junyu
    Ding, Errui
    Zhao, Yao
    Wang, Jingdong
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14983 - 14992
  • [4] Ultra-FastNet: an end-to-end learnable network for multi-person posture prediction
    Peng, Tiandi
    Luo, Yanmin
    Ou, Zhilong
    Du, Jixiang
    Lin, Gonggeng
    [J]. JOURNAL OF SUPERCOMPUTING, 2024, 80 (18): : 26462 - 26482
  • [5] EFCPose: End-to-End Multi-Person Pose Estimation With Fully Convolutional Heads
    Wang, Haixin
    Zhou, Lu
    Chen, Yingying
    Chen, Zhiyang
    Tang, Ming
    Wang, Jinqiao
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6039 - 6050
  • [6] Multi-person 3D Pose Estimation and Tracking in Sports
    Bridgeman, Lewis
    Volino, Marco
    Guillemaut, Jean-Yves
    Hilton, Adrian
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 2487 - 2496
  • [7] E2Pose: Fully Convolutional Networks for End-to-End Multi-Person Pose Estimation
    Tobeta, Masakazu
    Sawada, Yoshihide
    Zheng, Ze
    Takamuku, Sawa
    Natori, Naotake
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 532 - 537
  • [8] End-to-End Feature Pyramid Network for Real-Time Multi-Person Pose Estimation
    Luo, Dingli
    Du, Songlin
    Ikenaga, Takeshi
    [J]. PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
  • [9] VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild
    Zhang, Yifu
    Wang, Chunyu
    Wang, Xinggang
    Liu, Wenyu
    Zeng, Wenjun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2613 - 2626
  • [10] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536