Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation

被引:6
|
作者
Li, Ziwen [1 ]
Xu, Bo [1 ]
Huang, Han [1 ]
Lu, Cheng [2 ]
Guo, Yandong [1 ]
机构
[1] OPPO Res Inst, Palo Alto, CA 94303 USA
[2] Xmotors, Santa Clara, CA USA
关键词
D O I
10.1109/WACV51458.2022.00071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several video-based 3D pose and shape estimation algorithms have been proposed to resolve the temporal inconsistency of single-image-based methods. However it still remains challenging to have stable and accurate reconstruction. In this paper, we propose a new framework Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation (DTS-VIBE), to generate 3D human pose and mesh from RGB videos. We reformulate the task as a multi-modality problem that fuses RGB and optical flow for more reliable estimation. In order to fully utilize both sensory modalities (RGB or optical flow), we train a two-stream temporal network based on transformer to predict SMPL parameters. The supplementary modality, optical flow, helps to maintain temporal consistency by leveraging motion knowledge between two consecutive frames. The proposed algorithm is extensively evaluated on the Human3.6 and 3DPW datasets. The experimental results show that it outperforms other state-of-the-art methods by a significant margin.
引用
收藏
页码:637 / 646
页数:10
相关论文
共 50 条
  • [1] VIBE: Video Inference for Human Body Pose and Shape Estimation
    Kocabas, Muhammed
    Athanasiou, Nikos
    Black, Michael J.
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5252 - 5262
  • [2] Research on Human Pose Estimation Algorithm Based on Two-Stream Residual Steps Network
    Zhang, Kaisheng
    Li, Haochen
    Peng, Peng
    [J]. 2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1096 - 1102
  • [3] 3D Human Pose Estimation Using Two-Stream Architecture with Joint Training
    Kang, Jian
    Fan, Wanshu
    Li, Yijing
    Liu, Rui
    Zhou, Dongsheng
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 137 (01): : 607 - 629
  • [4] Two-stream Deep Representation for Human Action Recognition
    Ghrab, Najla Bouarada
    Fendri, Emna
    Hammami, Mohamed
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [5] Two-Stream Recurrent Convolutional Neural Networks for Video Saliency Estimation
    Wei, Xiao
    Song, Li
    Xie, Rong
    Zhang, Wenjun
    [J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2017, : 419 - 423
  • [6] Evaluating Two-Stream CNN for Video Classification
    Ye, Hao
    Wu, Zuxuan
    Zhao, Rui-Wei
    Wang, Xi
    Jiang, Yu-Gang
    Xue, Xiangyang
    [J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 435 - 442
  • [7] Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation
    Omran, Mohamed
    Lassner, Christoph
    Pons-Moll, Gerard
    Gehler, Peter V.
    Schiele, Bernt
    [J]. 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 484 - 494
  • [8] Imposing temporal consistency on deep monocular body shape and pose estimation
    Alexandra Zimmer
    Anna Hilsmann
    Wieland Morgenstern
    Peter Eisert
    [J]. Computational Visual Media, 2023, 9 : 123 - 139
  • [9] Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation
    Zimmer, A.
    Hilsmann, Anna
    Morgenstern, W.
    Eisert, P.
    [J]. arXiv, 2022,
  • [10] Imposing temporal consistency on deep monocular body shape and pose estimation
    Zimmer, Alexandra
    Hilsmann, Anna
    Morgenstern, Wieland
    Eisert, Peter
    [J]. COMPUTATIONAL VISUAL MEDIA, 2023, 9 (01) : 123 - 139