Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB

被引:27
|
作者
Guo, Shaoxiang [1 ]
Rigall, Eric [1 ]
Qi, Lin [1 ]
Dong, Xinghui [2 ]
Li, Haiyan [1 ]
Dong, Junyu [1 ]
机构
[1] Ocean Univ China, Dept Informat Sci & Technol, Qingdao 266100, Peoples R China
[2] Univ Manchester, Ctr Imaging Sci, Manchester M13 9PT, Lancs, England
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Three-dimensional displays; Pose estimation; Two dimensional displays; Feature extraction; Cameras; Convolutional neural networks; Solid modeling; Computer vision; hand pose estimation; graph CNNs; self-supervision;
D O I
10.1109/TCSVT.2020.3004453
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Hand pose estimation in 3D space from a single RGB image is a highly challenging problem due to self-geometric ambiguities, diverse texture, viewpoints, and self-occlusions. Existing work proves that a network structure with multi-scale resolution subnets, fused in parallel can more effectively shows the spatial accuracy of 2D pose estimation. Nevertheless, the features extracted by traditional convolutional neural networks cannot efficiently express the unique topological structure of hand key points based on discrete and correlated properties. Some applications of hand pose estimation based on traditional convolutional neural networks have demonstrated that the structural similarity between the graph and hand key points can improve the accuracy of the 3D hand pose regression. In this paper, we design and implement an end-to-end network for predicting 3D hand pose from a single RGB image. We first extract multiple feature maps from different resolutions and make parallel feature fusion, and then model a graph-based convolutional neural network module to predict the initial 3D hand key points. Next, we use 2D spatial relationships and 3D geometric knowledge to build a self-supervised module to eliminate domain gaps between 2D and 3D space. Finally, the final 3D hand pose is calculated by averaging the 3D hand poses from the GCN output and the self-supervised module output. We evaluate the proposed method on two challenging benchmark datasets for 3D hand pose estimation. Experimental results show the effectiveness of our proposed method that achieves state-of-the-art performance on the benchmark datasets.
引用
收藏
页码:1514 / 1525
页数:12
相关论文
共 50 条
  • [21] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
    Fenghao Zhang
    Lin Zhao
    Shengling Li
    Wanjuan Su
    Liman Liu
    Wenbing Tao
    Computational Visual Media, 2024, 10 : 79 - 96
  • [22] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
    Zhang, Fenghao
    Zhao, Lin
    Li, Shengling
    Su, Wanjuan
    Liu, Liman
    Tao, Wenbing
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (01): : 79 - 96
  • [23] Model-Based 3D Hand Pose Estimation from Monocular Video
    de La Gorce, Martin
    Fleet, David J.
    Paragios, Nikos
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (09) : 1793 - 1805
  • [24] Self-Supervised Monocular Depth Estimation With 3-D Displacement Module for Laparoscopic Images
    Xu, Chi
    Huang, Baoru
    Elson, Daniel S.
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2022, 4 (02): : 331 - 334
  • [25] Self-supervised 3D vehicle detection based on monocular images
    Liu, He
    Sun, Yi
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 127
  • [26] Rotated Orthographic Projection for Self-supervised 3D Human Pose Estimation
    Yao, Yao
    Pan, Yixuan
    Shi, Wenjun
    Zhu, Dongchen
    Wang, Lei
    Li, Jiamao
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 422 - 439
  • [27] Self-supervised Vision Transformers for 3D pose estimation of novel objects
    Thalhammer, Stefan
    Weibel, Jean-Baptiste
    Vincze, Markus
    Garcia-Rodriguez, Jose
    IMAGE AND VISION COMPUTING, 2023, 139
  • [28] Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation
    Liu, Xingyu
    Ren, Pengfei
    Gao, Yuanyuan
    Wang, Jingyu
    Sun, Haifeng
    Qi, Qi
    Zhuang, Zirui
    Liao, Jianxin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3756 - 3764
  • [29] 3D Distillation: Improving Self-Supervised Monocular Depth Estimation on Reflective Surfaces
    Shi, Xuepeng
    Dikov, Georgi
    Reitmayr, Gerhard
    Kim, Tae-Kyun
    Ghafoorian, Mohsen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9099 - 9109
  • [30] 3D Hand Shape and Pose Estimation from a Single RGB Image
    Ge, Liuhao
    Ren, Zhou
    Li, Yuncheng
    Xue, Zehao
    Wang, Yingying
    Cai, Jianfei
    Yuan, Junsong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10825 - 10834