PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

被引:14
|
作者
Qiu, Zhongwei [1 ,3 ,4 ]
Yang, Qiansheng [2 ]
Wang, Jian [2 ]
Feng, Haocheng [2 ]
Han, Junyu [2 ]
Ding, Errui [2 ]
Xu, Chang [3 ]
Fu, Dongmei [1 ,4 ]
Wang, Jingdong [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing, Peoples R China
[2] Baidu, Beijing, Peoples R China
[3] Univ Sydney, Sydney, NSW, Australia
[4] Beijing Engn Res Ctr Ind Spectrum Imaging, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.02036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects human instances in each frame and then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial instances can not be captured. In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. In PSVT, a spatio-temporal encoder (STE) captures the global feature dependencies among spatial objects. Then, spatio-temporal pose decoder (STPD) and shape decoder (STSD) capture the global dependencies between pose queries and feature tokens, shape queries and feature tokens, respectively. To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame. Besides, we propose a novel pose-guided attention (PGA) for shape decoder to better predict shape parameters. The two components strengthen the decoder of PSVT to improve performance. Extensive experiments on the four datasets show that PSVT achieves stage-of-the-art results.
引用
收藏
页码:21254 / 21263
页数:10
相关论文
共 50 条
  • [41] PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation
    Guo, Wen
    Corona, Enric
    Moreno-Noguer, Francesc
    Alameda-Pineda, Xavier
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2795 - 2805
  • [42] Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views
    Dong, Junting
    Jiang, Wen
    Huang, Qixing
    Bao, Hujun
    Zhou, Xiaowei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7784 - 7793
  • [43] Single-shot 3D multi-person pose estimation in complex images
    Benzine, Abdallah
    Luvison, Bertrand
    Pham, Quoc Cuong
    Achard, Catherine
    PATTERN RECOGNITION, 2021, 112
  • [44] CRENet: Crowd region enhancement network for multi-person 3D pose estimation
    Li, Zhaokun
    Liu, Qiong
    IMAGE AND VISION COMPUTING, 2024, 151
  • [45] Depth Decoupling for Bottom-Up Multi-Person 3D Pose Estimation
    Lie, Zhaokun
    Liu, Qiong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 412 - 428
  • [46] Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision
    Veges, Marton
    Lorincz, Andras
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 258 - 270
  • [47] Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes
    Renshu Gu
    Zhongyu Jiang
    Gaoang Wang
    Kevin McQuade
    Jenq-Neng Hwang
    Multimedia Tools and Applications, 2022, 81 : 32883 - 32906
  • [48] Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution
    Gu, Renshu
    Wang, Gaoang
    Hwang, Jenq-Neng
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8243 - 8250
  • [49] MMDA: Multi-person marginal distribution awareness for monocular 3D pose estimation
    Liu, Sheng
    Shuai, Jianghai
    Li, Yang
    Du, Sidan
    IET IMAGE PROCESSING, 2023, 17 (07) : 2182 - 2191
  • [50] Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation
    Jin, Lei
    Xu, Chenyang
    Wang, Xiaojuan
    Xiao, Yabo
    Guo, Yandong
    Nie, Xuecheng
    Zhao, Jian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13076 - 13085