3D Human Pose and Shape Reconstruction From Videos via Confidence-Aware Temporal Feature Aggregation

被引:5
|
作者
Zhang, Hongrun [1 ]
Meng, Yanda [1 ]
Zhao, Yitian [2 ]
Qian, Xuesheng [3 ]
Qiao, Yihong [3 ]
Yang, Xiaoyun [4 ]
Zheng, Yalin [1 ]
机构
[1] Univ Liverpool, Inst Life Course & Med Sci, Liverpool L7 8TX, Merseyside, England
[2] Chinese Acad Sci, Ningbo Inst Mat Technol & Engn, Cixi Inst Biomed Engn, Ningbo 315201, Peoples R China
[3] China IntelliCloud Co, Shanghai, Peoples R China
[4] Remark AI UK Ltd, London SE1 9PD, England
关键词
Three-dimensional displays; Feature extraction; Shape; Training; Correlation; Solid modeling; Videos; Human pose; temporal estimation; uncertainty;
D O I
10.1109/TMM.2022.3167887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating 3D human body shapes and poses from videos is a challenging computer vision task. The intrinsic temporal information embedded in adjacent frames is helpful in making accurate estimations. Existing approaches learn temporal features of the target frames simply by aggregating features of their adjacent frames, using off-the-shelf deep neural networks. Consequently these approaches cannot explicitly and effectively use the correlations between adjacent frames to help infer the parameters of the target frames. In this paper, we propose a novel framework that can measure the correlations amongst adjacent frames in the form of an estimated confidence metric. The confidence value will indicate to what extent the adjacent frames can help predict the target frames' 3D shapes and poses. Based on the estimated confidence values, temporally aggregated features are then obtained by adaptively allocating different weights to the temporal predicted features from the adjacent frames. The final 3D shapes and poses are estimated by regressing from the temporally aggregated features. Experimental results on three benchmark datasets show that the proposed method outperforms state-ofthe-art approaches (even without the motion priors involved in training). In particular, the proposed method is more robust against corrupted frames.
引用
收藏
页码:3868 / 3880
页数:13
相关论文
共 50 条
  • [1] Bidirectional temporal feature for 3D human pose and shape estimation from a video
    Sun, Libo
    Tang, Ting
    Qu, Yuke
    Qin, Wenhu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [2] Self-attentive 3D human pose and shape estimation from videos
    Chen, Yun-Chun
    Piccirilli, Marco
    Piramuthu, Robinson
    Yang, Ming-Hsuan
    Computer Vision and Image Understanding, 2021, 213
  • [3] Self-attentive 3D human pose and shape estimation from videos
    Chen, Yun-Chun
    Piccirilli, Marco
    Piramuthu, Robinson
    Yang, Ming-Hsuan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 213
  • [4] Kinematics-aware spatial-temporal feature transform for 3D human pose estimation
    Du, Songlin
    Yuan, Zhiwei
    Ikenaga, Takeshi
    PATTERN RECOGNITION, 2024, 150
  • [5] 3D Human Pose, Shape and Texture From Low-Resolution Images and Videos
    Xu, Xiangyu
    Chen, Hao
    Moreno-Noguer, Francesc
    Jeni, Laszlo A.
    De la Torre, Fernando
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 4490 - 4504
  • [6] Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation
    Honari, Sina
    Constantin, Victor
    Rhodin, Helge
    Salzmann, Mathieu
    Fua, Pascal
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6415 - 6427
  • [7] POCO: 3D Pose and Shape Estimation with Confidence
    Dwivedi, Sai Kumar
    Schmid, Cordelia
    Yi, Hongwei
    Black, Michael J.
    Tzionas, Dimitrios
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 85 - 95
  • [8] CONFIDENCE-AWARE CLUSTERED LANDMARK FILTERING FOR HYBRID 3D FACE TRACKING
    Barros, Jilliam Maria Diaz
    Wang, Chen-Yu
    Stricker, Didier
    Rambach, Jason
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2945 - 2949
  • [9] Real-time 3D human pose and motion reconstruction from monocular RGB videos
    Yiannakides, Anastasios
    Aristidou, Andreas
    Chrysanthou, Yiorgos
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
  • [10] Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds
    Jiang, Haiyong
    Cai, Jianfei
    Zheng, Jianmin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5430 - 5440