3D hand pose and shape estimation from monocular RGB via efficient 2D cues

被引:2
|
作者
Zhang, Fenghao [1 ]
Zhao, Lin [2 ]
Li, Shengling [1 ]
Su, Wanjuan [2 ]
Liu, Liman [1 ]
Tao, Wenbing [2 ]
机构
[1] South Cent Minzu Univ, Sch Biomed Engn, Hubei Key Lab Med Informat Anal & Tumor Diag & Tre, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Sci & Technol Multispectral Informat, Wuhan 430074, Peoples R China
来源
COMPUTATIONAL VISUAL MEDIA | 2024年 / 10卷 / 01期
基金
中国国家自然科学基金;
关键词
hand; 3D reconstruction; deep learning; image features; 3D mesh;
D O I
10.1007/s41095-023-0346-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.
引用
收藏
页码:79 / 96
页数:18
相关论文
共 50 条
  • [21] Survey on depth and RGB image-based 3D hand shape and pose estimation
    Lin HUANG
    Boshen ZHANG
    Zhilin GUO
    Yang XIAO
    Zhiguo CAO
    Junsong YUAN
    虚拟现实与智能硬件(中英文), 2021, 3 (03) : 207 - 234
  • [22] Survey on depth and RGB image-based 3D hand shape and pose estimation
    Huang L.
    Zhang B.
    Guo Z.
    Xiao Y.
    Cao Z.
    Yuan J.
    Virtual Reality and Intelligent Hardware, 2021, 3 (03): : 207 - 234
  • [23] Efficient Monocular Pose Estimation for Complex 3D Models
    Rubio, A.
    Villamizar, M.
    Ferraz, L.
    Penate-Sanchez, A.
    Ramisa, A.
    Simo-Serra, E.
    Sanfeliu, A.
    Moreno-Noguer, F.
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 1397 - 1402
  • [24] Review on 3D Hand Pose Estimation Based on a RGB Image
    Xiao Y.
    Liu Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (02): : 161 - 172
  • [25] Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation
    Liu, Xingyu
    Ren, Pengfei
    Gao, Yuanyuan
    Wang, Jingyu
    Sun, Haifeng
    Qi, Qi
    Zhuang, Zirui
    Liao, Jianxin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3756 - 3764
  • [26] Monocular 3D Pose Estimation via Pose Grammar and Data Augmentation
    Xu, Yuanlu
    Wang, Wenguan
    Liu, Tengyu
    Liu, Xiaobai
    Xie, Jianwen
    Zhu, Song-Chun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6327 - 6344
  • [27] Model-Based 3D Hand Pose Estimation from Monocular Video
    de La Gorce, Martin
    Fleet, David J.
    Paragios, Nikos
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (09) : 1793 - 1805
  • [28] Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB
    Guo, Shaoxiang
    Rigall, Eric
    Qi, Lin
    Dong, Xinghui
    Li, Haiyan
    Dong, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1514 - 1525
  • [29] 3D Human Pose Estimation via Deep Learning from 2D annotations
    Brau, Ernesto
    Jiang, Hao
    PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2016, : 582 - 591
  • [30] 3D hand pose retrieval from a single 2D image
    Guan, HY
    Chua, CS
    Ho, YK
    2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 157 - 160