Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB

被引：27

作者：

Guo, Shaoxiang ^{[1
]}

Rigall, Eric ^{[1
]}

Qi, Lin ^{[1
]}

Dong, Xinghui ^{[2
]}

Li, Haiyan ^{[1
]}

Dong, Junyu ^{[1
]}

机构：

[1] Ocean Univ China, Dept Informat Sci & Technol, Qingdao 266100, Peoples R China

[2] Univ Manchester, Ctr Imaging Sci, Manchester M13 9PT, Lancs, England

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 04期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Three-dimensional displays; Pose estimation; Two dimensional displays; Feature extraction; Cameras; Convolutional neural networks; Solid modeling; Computer vision; hand pose estimation; graph CNNs; self-supervision;

D O I：

10.1109/TCSVT.2020.3004453

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Hand pose estimation in 3D space from a single RGB image is a highly challenging problem due to self-geometric ambiguities, diverse texture, viewpoints, and self-occlusions. Existing work proves that a network structure with multi-scale resolution subnets, fused in parallel can more effectively shows the spatial accuracy of 2D pose estimation. Nevertheless, the features extracted by traditional convolutional neural networks cannot efficiently express the unique topological structure of hand key points based on discrete and correlated properties. Some applications of hand pose estimation based on traditional convolutional neural networks have demonstrated that the structural similarity between the graph and hand key points can improve the accuracy of the 3D hand pose regression. In this paper, we design and implement an end-to-end network for predicting 3D hand pose from a single RGB image. We first extract multiple feature maps from different resolutions and make parallel feature fusion, and then model a graph-based convolutional neural network module to predict the initial 3D hand key points. Next, we use 2D spatial relationships and 3D geometric knowledge to build a self-supervised module to eliminate domain gaps between 2D and 3D space. Finally, the final 3D hand pose is calculated by averaging the 3D hand poses from the GCN output and the self-supervised module output. We evaluate the proposed method on two challenging benchmark datasets for 3D hand pose estimation. Experimental results show the effectiveness of our proposed method that achieves state-of-the-art performance on the benchmark datasets.

引用

页码：1514 / 1525

页数：12

共 50 条

[21] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
Fenghao Zhang
Lin Zhao
Shengling Li
Wanjuan Su
Liman Liu
Wenbing Tao
Computational Visual Media, 2024, 10 : 79 - 96
[22] 3D hand pose and shape estimation from monocular RGB via efficient 2D cues
Zhang, Fenghao
Zhao, Lin
Li, Shengling
Su, Wanjuan
Liu, Liman
Tao, Wenbing
COMPUTATIONAL VISUAL MEDIA, 2024, 10 (01): : 79 - 96
[23] Model-Based 3D Hand Pose Estimation from Monocular Video
de La Gorce, Martin
Fleet, David J.
Paragios, Nikos
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (09) : 1793 - 1805
[24] Self-Supervised Monocular Depth Estimation With 3-D Displacement Module for Laparoscopic Images
Xu, Chi
Huang, Baoru
Elson, Daniel S.
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2022, 4 (02): : 331 - 334
[25] Self-supervised 3D vehicle detection based on monocular images
Liu, He
Sun, Yi
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 127
[26] Rotated Orthographic Projection for Self-supervised 3D Human Pose Estimation
Yao, Yao
Pan, Yixuan
Shi, Wenjun
Zhu, Dongchen
Wang, Lei
Li, Jiamao
COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 422 - 439
[27] Self-supervised Vision Transformers for 3D pose estimation of novel objects
Thalhammer, Stefan
Weibel, Jean-Baptiste
Vincze, Markus
Garcia-Rodriguez, Jose
IMAGE AND VISION COMPUTING, 2023, 139
[28] Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation
Liu, Xingyu
Ren, Pengfei
Gao, Yuanyuan
Wang, Jingyu
Sun, Haifeng
Qi, Qi
Zhuang, Zirui
Liao, Jianxin
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3756 - 3764
[29] 3D Distillation: Improving Self-Supervised Monocular Depth Estimation on Reflective Surfaces
Shi, Xuepeng
Dikov, Georgi
Reitmayr, Gerhard
Kim, Tae-Kyun
Ghafoorian, Mohsen
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9099 - 9109
[30] 3D Hand Shape and Pose Estimation from a Single RGB Image
Ge, Liuhao
Ren, Zhou
Li, Yuncheng
Xue, Zehao
Wang, Yingying
Cai, Jianfei
Yuan, Junsong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10825 - 10834

← 1 2 3 4 5 →