PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

被引:14
|
作者
Qiu, Zhongwei [1 ,3 ,4 ]
Yang, Qiansheng [2 ]
Wang, Jian [2 ]
Feng, Haocheng [2 ]
Han, Junyu [2 ]
Ding, Errui [2 ]
Xu, Chang [3 ]
Fu, Dongmei [1 ,4 ]
Wang, Jingdong [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing, Peoples R China
[2] Baidu, Beijing, Peoples R China
[3] Univ Sydney, Sydney, NSW, Australia
[4] Beijing Engn Res Ctr Ind Spectrum Imaging, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.02036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects human instances in each frame and then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial instances can not be captured. In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. In PSVT, a spatio-temporal encoder (STE) captures the global feature dependencies among spatial objects. Then, spatio-temporal pose decoder (STPD) and shape decoder (STSD) capture the global dependencies between pose queries and feature tokens, shape queries and feature tokens, respectively. To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame. Besides, we propose a novel pose-guided attention (PGA) for shape decoder to better predict shape parameters. The two components strengthen the decoder of PSVT to improve performance. Extensive experiments on the four datasets show that PSVT achieves stage-of-the-art results.
引用
收藏
页码:21254 / 21263
页数:10
相关论文
共 50 条
  • [1] End-to-End Multi-Person Pose Estimation with Transformers
    Shi, Dahu
    Wei, Xing
    Li, Liangqi
    Ren, Ye
    Tan, Wenming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
  • [2] Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
    Yan, Kangwei
    Wang, Fei
    Qian, Bo
    Ding, Han
    Han, Jinsong
    Wei, Xing
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 969 - 978
  • [3] TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
    Reddy, N. Dinesh
    Guigues, Laurent
    Pishchulin, Leonid
    Eledath, Jayan
    Narasimhan, Srinivasa G.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15185 - 15195
  • [4] Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation
    Liu, Huan
    Chen, Qiang
    Tan, Zichang
    Liu, Jiang-Jiang
    Wang, Jian
    Su, Xiangbo
    Li, Xiaolong
    Yao, Kun
    Han, Junyu
    Ding, Errui
    Zhao, Yao
    Wang, Jingdong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14983 - 14992
  • [5] CPFormer: End-to-End Multi-Person Human Pose Estimation From Raw Radar Cubes With Transformers
    Chen, Lin
    Wang, Guoli
    IEEE SENSORS JOURNAL, 2025, 25 (07) : 12466 - 12478
  • [6] Multi-Person Absolute 3D Pose and Shape Estimation from Video
    Zhang, Kaifu
    Li, Yihui
    Guan, Yisheng
    Xi, Ning
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2021, PT III, 2021, 13015 : 189 - 200
  • [7] EFCPose: End-to-End Multi-Person Pose Estimation With Fully Convolutional Heads
    Wang, Haixin
    Zhou, Lu
    Chen, Yingying
    Chen, Zhiyang
    Tang, Ming
    Wang, Jinqiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6039 - 6050
  • [8] E2Pose: Fully Convolutional Networks for End-to-End Multi-Person Pose Estimation
    Tobeta, Masakazu
    Sawada, Yoshihide
    Zheng, Ze
    Takamuku, Sawa
    Natori, Naotake
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 532 - 537
  • [9] End-to-End Feature Pyramid Network for Real-Time Multi-Person Pose Estimation
    Luo, Dingli
    Du, Songlin
    Ikenaga, Takeshi
    PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
  • [10] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536