Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation

被引：6

作者：

Li, Ziwen ^{[1
]}

Xu, Bo ^{[1
]}

Huang, Han ^{[1
]}

Lu, Cheng ^{[2
]}

Guo, Yandong ^{[1
]}

机构：

[1] OPPO Res Inst, Palo Alto, CA 94303 USA

[2] Xmotors, Santa Clara, CA USA

来源：

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) | 2022年

关键词：

D O I：

10.1109/WACV51458.2022.00071

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Several video-based 3D pose and shape estimation algorithms have been proposed to resolve the temporal inconsistency of single-image-based methods. However it still remains challenging to have stable and accurate reconstruction. In this paper, we propose a new framework Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation (DTS-VIBE), to generate 3D human pose and mesh from RGB videos. We reformulate the task as a multi-modality problem that fuses RGB and optical flow for more reliable estimation. In order to fully utilize both sensory modalities (RGB or optical flow), we train a two-stream temporal network based on transformer to predict SMPL parameters. The supplementary modality, optical flow, helps to maintain temporal consistency by leveraging motion knowledge between two consecutive frames. The proposed algorithm is extensively evaluated on the Human3.6 and 3DPW datasets. The experimental results show that it outperforms other state-of-the-art methods by a significant margin.

引用

页码：637 / 646

页数：10

共 50 条

[1] VIBE: Video Inference for Human Body Pose and Shape Estimation
Kocabas, Muhammed
Athanasiou, Nikos
Black, Michael J.
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5252 - 5262
[2] Research on Human Pose Estimation Algorithm Based on Two-Stream Residual Steps Network
Zhang, Kaisheng
Li, Haochen
Peng, Peng
[J]. 2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1096 - 1102
[3] 3D Human Pose Estimation Using Two-Stream Architecture with Joint Training
Kang, Jian
Fan, Wanshu
Li, Yijing
Liu, Rui
Zhou, Dongsheng
[J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 137 (01): : 607 - 629
[4] Two-stream Deep Representation for Human Action Recognition
Ghrab, Najla Bouarada
Fendri, Emna
Hammami, Mohamed
[J]. FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
[5] Two-Stream Recurrent Convolutional Neural Networks for Video Saliency Estimation
Wei, Xiao
Song, Li
Xie, Rong
Zhang, Wenjun
[J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2017, : 419 - 423
[6] Evaluating Two-Stream CNN for Video Classification
Ye, Hao
Wu, Zuxuan
Zhao, Rui-Wei
Wang, Xi
Jiang, Yu-Gang
Xue, Xiangyang
[J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 435 - 442
[7] Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation
Omran, Mohamed
Lassner, Christoph
Pons-Moll, Gerard
Gehler, Peter V.
Schiele, Bernt
[J]. 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 484 - 494
[8] Imposing temporal consistency on deep monocular body shape and pose estimation
Alexandra Zimmer
Anna Hilsmann
Wieland Morgenstern
Peter Eisert
[J]. Computational Visual Media, 2023, 9 : 123 - 139
[9] Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation
Zimmer, A.
Hilsmann, Anna
Morgenstern, W.
Eisert, P.
[J]. arXiv, 2022,
[10] Imposing temporal consistency on deep monocular body shape and pose estimation
Zimmer, Alexandra
Hilsmann, Anna
Morgenstern, Wieland
Eisert, Peter
[J]. COMPUTATIONAL VISUAL MEDIA, 2023, 9 (01) : 123 - 139

← 1 2 3 4 5 →