PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

被引：14

作者：

Qiu, Zhongwei ^{[1
,3
,4
]}

Yang, Qiansheng ^{[2
]}

Wang, Jian ^{[2
]}

Feng, Haocheng ^{[2
]}

Han, Junyu ^{[2
]}

Ding, Errui ^{[2
]}

Xu, Chang ^{[3
]}

Fu, Dongmei ^{[1
,4
]}

Wang, Jingdong ^{[2
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing, Peoples R China

[2] Baidu, Beijing, Peoples R China

[3] Univ Sydney, Sydney, NSW, Australia

[4] Beijing Engn Res Ctr Ind Spectrum Imaging, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02036

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects human instances in each frame and then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial instances can not be captured. In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. In PSVT, a spatio-temporal encoder (STE) captures the global feature dependencies among spatial objects. Then, spatio-temporal pose decoder (STPD) and shape decoder (STSD) capture the global dependencies between pose queries and feature tokens, shape queries and feature tokens, respectively. To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame. Besides, we propose a novel pose-guided attention (PGA) for shape decoder to better predict shape parameters. The two components strengthen the decoder of PSVT to improve performance. Extensive experiments on the four datasets show that PSVT achieves stage-of-the-art results.

引用

页码：21254 / 21263

页数：10

共 50 条

[1] End-to-End Multi-Person Pose Estimation with Transformers
Shi, Dahu
Wei, Xing
Li, Liangqi
Ren, Ye
Tan, Wenming
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
[2] Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Yan, Kangwei
Wang, Fei
Qian, Bo
Ding, Han
Han, Jinsong
Wei, Xing
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 969 - 978
[3] TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
Reddy, N. Dinesh
Guigues, Laurent
Pishchulin, Leonid
Eledath, Jayan
Narasimhan, Srinivasa G.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15185 - 15195
[4] Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation
Liu, Huan
Chen, Qiang
Tan, Zichang
Liu, Jiang-Jiang
Wang, Jian
Su, Xiangbo
Li, Xiaolong
Yao, Kun
Han, Junyu
Ding, Errui
Zhao, Yao
Wang, Jingdong
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14983 - 14992
[5] CPFormer: End-to-End Multi-Person Human Pose Estimation From Raw Radar Cubes With Transformers
Chen, Lin
Wang, Guoli
IEEE SENSORS JOURNAL, 2025, 25 (07) : 12466 - 12478
[6] Multi-Person Absolute 3D Pose and Shape Estimation from Video
Zhang, Kaifu
Li, Yihui
Guan, Yisheng
Xi, Ning
INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2021, PT III, 2021, 13015 : 189 - 200
[7] EFCPose: End-to-End Multi-Person Pose Estimation With Fully Convolutional Heads
Wang, Haixin
Zhou, Lu
Chen, Yingying
Chen, Zhiyang
Tang, Ming
Wang, Jinqiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6039 - 6050
[8] E2Pose: Fully Convolutional Networks for End-to-End Multi-Person Pose Estimation
Tobeta, Masakazu
Sawada, Yoshihide
Zheng, Ze
Takamuku, Sawa
Natori, Naotake
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 532 - 537
[9] End-to-End Feature Pyramid Network for Real-Time Multi-Person Pose Estimation
Luo, Dingli
Du, Songlin
Ikenaga, Takeshi
PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
[10] End-to-end 3D Human Pose Estimation with Transformer
Zhang, Bowei
Cui, Peng
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536

← 1 2 3 4 5 →