3D Human Pose and Shape Reconstruction From Videos via Confidence-Aware Temporal Feature Aggregation

被引:5
|
作者
Zhang, Hongrun [1 ]
Meng, Yanda [1 ]
Zhao, Yitian [2 ]
Qian, Xuesheng [3 ]
Qiao, Yihong [3 ]
Yang, Xiaoyun [4 ]
Zheng, Yalin [1 ]
机构
[1] Univ Liverpool, Inst Life Course & Med Sci, Liverpool L7 8TX, Merseyside, England
[2] Chinese Acad Sci, Ningbo Inst Mat Technol & Engn, Cixi Inst Biomed Engn, Ningbo 315201, Peoples R China
[3] China IntelliCloud Co, Shanghai, Peoples R China
[4] Remark AI UK Ltd, London SE1 9PD, England
关键词
Three-dimensional displays; Feature extraction; Shape; Training; Correlation; Solid modeling; Videos; Human pose; temporal estimation; uncertainty;
D O I
10.1109/TMM.2022.3167887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating 3D human body shapes and poses from videos is a challenging computer vision task. The intrinsic temporal information embedded in adjacent frames is helpful in making accurate estimations. Existing approaches learn temporal features of the target frames simply by aggregating features of their adjacent frames, using off-the-shelf deep neural networks. Consequently these approaches cannot explicitly and effectively use the correlations between adjacent frames to help infer the parameters of the target frames. In this paper, we propose a novel framework that can measure the correlations amongst adjacent frames in the form of an estimated confidence metric. The confidence value will indicate to what extent the adjacent frames can help predict the target frames' 3D shapes and poses. Based on the estimated confidence values, temporally aggregated features are then obtained by adaptively allocating different weights to the temporal predicted features from the adjacent frames. The final 3D shapes and poses are estimated by regressing from the temporally aggregated features. Experimental results on three benchmark datasets show that the proposed method outperforms state-ofthe-art approaches (even without the motion priors involved in training). In particular, the proposed method is more robust against corrupted frames.
引用
收藏
页码:3868 / 3880
页数:13
相关论文
共 50 条
  • [21] DR-Net: denoising and reconstruction network for 3D human pose estimation from monocular RGB videos
    Chang, J. Y.
    ELECTRONICS LETTERS, 2018, 54 (02) : 70 - 72
  • [22] On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
    Li, Zhi
    Wang, Xuan
    Wang, Fei
    Jiang, Peilin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2192 - 2201
  • [23] LEARNING POSE-AWARE 3D RECONSTRUCTION VIA 2D-3D SELF-CONSISTENCY
    Liao, Yi-Lun
    Yang, Yao-Cheng
    Lin, Yuan-Fang
    Chen, Pin-Jung
    Kuo, Chia-Wen
    Chiu, Wei-Chen
    Wang, Yu-Chiang Frank
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3857 - 3861
  • [24] Pose Locality Constrained Representation for 3D Human Pose Reconstruction
    Fan, Xiaochuan
    Zheng, Kang
    Zhou, Youjie
    Wang, Song
    COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 : 174 - 188
  • [25] A 3D shape descriptor for human pose recovery
    Gond, Laetitia
    Sayd, Patrick
    Chateau, Thierry
    Dhome, Michel
    ARTICULATED MOTION AND DEFORMABLE OBJECTS, PROCEEDINGS, 2008, 5098 : 370 - +
  • [26] Personalized 3D Human Pose and Shape Refinement
    Wehrbein, Tom
    Rosenhahn, Bodo
    Matthews, Iain
    Stoll, Carsten
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4191 - 4201
  • [27] Automatic Feature Detection for 3D Surface Reconstruction from HDTV Endoscopic Videos
    Groch, Anja
    Baumhauer, Matthias
    Meinzer, Hans-Peter
    Maier-Hein, Lena
    MEDICAL IMAGING 2010: VISUALIZATION, IMAGE-GUIDED PROCEDURES, AND MODELING, 2010, 7625
  • [28] Pose estimation and 3D reconstruction of vehicles from stereo-images using a subcategory-aware shape prior
    Coenen, Max
    Rottensteiner, Franz
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 181 : 27 - 47
  • [29] Multi-scale Feature Injection for Occluded 3D Human Pose and Shape Estimation
    Shi, Yunhui
    Ge, Yangyang
    Wang, Jin
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4881 - 4886
  • [30] An Efficient 3d Head Pose Inference from Videos
    Dahmane, Mohamed
    Meunier, Jean
    IMAGE AND SIGNAL PROCESSING, PROCEEDINGS, 2010, 6134 : 368 - 375