3D Human Pose and Shape Reconstruction From Videos via Confidence-Aware Temporal Feature Aggregation

被引:5
|
作者
Zhang, Hongrun [1 ]
Meng, Yanda [1 ]
Zhao, Yitian [2 ]
Qian, Xuesheng [3 ]
Qiao, Yihong [3 ]
Yang, Xiaoyun [4 ]
Zheng, Yalin [1 ]
机构
[1] Univ Liverpool, Inst Life Course & Med Sci, Liverpool L7 8TX, Merseyside, England
[2] Chinese Acad Sci, Ningbo Inst Mat Technol & Engn, Cixi Inst Biomed Engn, Ningbo 315201, Peoples R China
[3] China IntelliCloud Co, Shanghai, Peoples R China
[4] Remark AI UK Ltd, London SE1 9PD, England
关键词
Three-dimensional displays; Feature extraction; Shape; Training; Correlation; Solid modeling; Videos; Human pose; temporal estimation; uncertainty;
D O I
10.1109/TMM.2022.3167887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating 3D human body shapes and poses from videos is a challenging computer vision task. The intrinsic temporal information embedded in adjacent frames is helpful in making accurate estimations. Existing approaches learn temporal features of the target frames simply by aggregating features of their adjacent frames, using off-the-shelf deep neural networks. Consequently these approaches cannot explicitly and effectively use the correlations between adjacent frames to help infer the parameters of the target frames. In this paper, we propose a novel framework that can measure the correlations amongst adjacent frames in the form of an estimated confidence metric. The confidence value will indicate to what extent the adjacent frames can help predict the target frames' 3D shapes and poses. Based on the estimated confidence values, temporally aggregated features are then obtained by adaptively allocating different weights to the temporal predicted features from the adjacent frames. The final 3D shapes and poses are estimated by regressing from the temporally aggregated features. Experimental results on three benchmark datasets show that the proposed method outperforms state-ofthe-art approaches (even without the motion priors involved in training). In particular, the proposed method is more robust against corrupted frames.
引用
收藏
页码:3868 / 3880
页数:13
相关论文
共 50 条
  • [31] 3D Face Reconstruction via Feature Point Depth Estimation and Shape Deformation
    Xiao, Quan
    Han, Lihua
    Liu, Peizhong
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2257 - 2262
  • [32] 3D human pose estimation via human structure-aware fully connected network
    Zhang, Xiaoyan
    Tang, Zhenhua
    Hou, Junhui
    Hao, Yanbin
    PATTERN RECOGNITION LETTERS, 2019, 125 : 404 - 410
  • [33] Learning 3D Human Shape and Pose From Dense Body Parts
    Zhang, Hongwen
    Cao, Jie
    Lu, Guo
    Ouyang, Wanli
    Sun, Zhenan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2610 - 2627
  • [34] Sequential 3D Human Pose and Shape Estimation from Point Clouds
    Wang, Kangkan
    Xie, Jin
    Zhang, Guofeng
    Liu, Lei
    Yang, Jian
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7273 - 7282
  • [35] 3D Human Body Shape and Pose Estimation from Depth Image
    Liu, Lei
    Wang, Kangkan
    Yang, Jian
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2020, 2020, 12305 : 410 - 421
  • [36] From Human Pose Similarity Metric to 3D Human Pose Estimator: Temporal Propagating LSTM Networks
    Lee, Kyoungoh
    Kim, Woojae
    Lee, Sanghoon
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 1781 - 1797
  • [37] Relation-aware interaction spatio-temporal network for 3D human pose estimation
    Zhang, Hehao
    Hu, Zhengping
    Bi, Shuai
    Di, Jirui
    Sun, Zhe
    DIGITAL SIGNAL PROCESSING, 2024, 155
  • [38] 3D surface reconstruction from endoscopic videos
    Kaufman, Arie
    Wang, Jianning
    VISUALIZATION IN MEDICINE AND LIFE SCIENCES, 2008, : 61 - +
  • [39] LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering
    Yang, Kaibing
    Gu, Renshu
    Wang, Maoyu
    Toyoura, Masahiro
    Xu, Gang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1938 - 1948
  • [40] Neural Descent for Visual 3D Human Pose and Shape
    Zanfir, Andrei
    Bazavan, Eduard Gabriel
    Zanfir, Mihai
    Freeman, William T.
    Sukthankar, Rahul
    Sminchisescu, Cristian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14479 - 14488