3D Human Pose Estimation Based on Volumetric Joint Coordinates

被引:0
|
作者
Wan Y. [1 ]
Song Y. [2 ]
Liu L. [1 ]
机构
[1] School of Mathematical Sciences, University of Science and Technology of China, Hefei
[2] Beijing Xuanmi Science and Technology Company, Beijing
关键词
convolutional network; deep learning; pose estimation; voxel;
D O I
10.3724/SP.J.1089.2022.19167
中图分类号
学科分类号
摘要
Estimating three-dimensional human pose from color images of single person is a fundamental problem in many applications. However, the problems of inaccuracy and ill-posed poses have been not well solved. A novel deep learning based approach for estimating 3D human poses from images is proposed. First voxel representation is adopted and the joint coordinates are presented to represent the poses. Second, space integral regression is used to compute the output results of the convolutional network. Finally, the output is sent into the fully connected network for joint training. Proposed algorithm has been tested under two standard test protocol of human3.6m dataset. Experimental results show that it obtains higher accuracy than most of previous methods and achieves well generalization ability in MPI-INF-3DHP dataset. © 2022 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:1411 / 1419
页数:8
相关论文
共 33 条
  • [1] Newell A, Yang K Y, Deng J., Stacked hourglass networks for human pose estimation[C], Proceedings of European Conference on Computer Vision, pp. 483-499, (2016)
  • [2] Zhou X W, Zhu M L, Leonardos S, Et al., Sparseness meets deepness: 3D human pose estimation from monocular video, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4966-4975, (2016)
  • [3] Li S J, Chan A B., 3D human pose estimation from monocular images with deep convolutional neural network, Proceedings of Asian Conference on Computer Vision, pp. 332-347, (2014)
  • [4] Bogo F, Kanazawa A, Lassner C, Et al., Keep It SMPL: automatic estimation of 3D human pose and shape from a single image[C], Proceedings of European Conference on Computer Vision, pp. 561-578, (2016)
  • [5] Sun X, Xiao B, Wei F Y, Et al., Integral human pose regression, Proceedings of European Conference on Computer Vision, pp. 536-553, (2018)
  • [6] Jain A, Tompson J, LeCun Y, Et al., Modeep: a deep learning framework using motion features for human pose estimation[C], Proceedings of Asian Conference on Computer Vision, pp. 302-315, (2014)
  • [7] Li B, Dai Y C, He M Y., Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference, Pattern Recognition, 83, pp. 328-339, (2018)
  • [8] Wu Chunmei, Hu Junhao, Yin Jianghua, Using improved generative adversarial networkfor human pose estimation, Computer Engineering and Applications, 56, 8, pp. 96-103, (2020)
  • [9] Sun Xinling, Zhang Hao, Zhao Li, Hierarchical target detection and human body attitude estimation based on structured SVM and CNN, Application Research of Computers, 37, 5, pp. 1566-1569, (2020)
  • [10] Sigal L, Balan A O, Black M J., HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated HumanMotion, International Journal of Computer Vision, 87, 1, (2010)