A hybrid network for estimating 3D interacting hand pose from a single RGB image

被引:0
|
作者
Bao, Wenxia [1 ]
Gao, Qiuyue [1 ]
Yang, Xianjun [2 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Anhui, Peoples R China
关键词
3D hand pose estimation; Interacting Hand; Hybrid network; End to end network;
D O I
10.1007/s11760-024-03043-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The estimation of 3D interacting hand pose from a single RGB image is a challenging problem. The hands tend to occlude each other and are self-similar in two-handed interactions. In this study, a simple, accurate end-to-end framework called HybridPoseNet is proposed for estimating 3D interactive hand pose. The hybrid network employs an encoder-decoder architecture. More specifically, the feature encoder is a hybrid structure that combines a convolutional neural network (CNN) with a transformer to accomplish the feature encoding of hand information. An ordinary CNN is employed to extract the local detailed features of a given image, and a vision transformer is used to capture the long-distance spatial interactions between the cross-positional feature vectors. Moreover, the 3D pose decoder is based on left and right network branches, which are fused via a feature enhancement module (FEM). The FEM helps reduce the ambiguity in appearance caused by the self-similarity of the hands. The decoder elevates the 2D pose to the 3D pose by estimating two depth components. The ablation experiments demonstrate the effectiveness of each module in the network. In addition, comprehensive experiments on the InterHand2.6M dataset show that the proposed method outperforms previous state-of-the-art methods for estimating interactive hand pose.
引用
收藏
页码:3801 / 3814
页数:14
相关论文
共 50 条
  • [1] A hybrid network for estimating 3D interacting hand pose from a single RGB image
    Wenxia Bao
    Qiuyue Gao
    Xianjun Yang
    Signal, Image and Video Processing, 2024, 18 : 3801 - 3814
  • [2] 3D interacting hand pose and shape estimation from a single RGB image
    Gao, Chengying
    Yang, Yujia
    Li, Wensheng
    NEUROCOMPUTING, 2022, 474 : 25 - 36
  • [3] 3D Hand Shape and Pose Estimation from a Single RGB Image
    Ge, Liuhao
    Ren, Zhou
    Li, Yuncheng
    Xue, Zehao
    Wang, Yingying
    Cai, Jianfei
    Yuan, Junsong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10825 - 10834
  • [4] Estimating 3D hand pose from a cluttered image
    Athitsos, V
    Sclaroff, S
    2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2003, : 432 - 439
  • [5] CFAM: Estimating 3D Hand Poses from a Single RGB Image with Attention
    Wang, Xianghan
    Jiang, Jie
    Guo, Yanming
    Kang, Lai
    Wei, Yingmei
    Li, Dan
    APPLIED SCIENCES-BASEL, 2020, 10 (02):
  • [6] 3D hand pose estimation from a single RGB image by weighting the occlusion and classification
    Mahdikhanlou, Khadijeh
    Ebrahimnezhad, Hossein
    PATTERN RECOGNITION, 2023, 136
  • [7] Camera distance helps 3D hand pose estimated from a single RGB image
    Cui, Yuan
    Li, Moran
    Gao, Yuan
    Gao, Changxin
    Wu, Fan
    Wen, Hao
    Li, Jiwei
    Sang, Nong
    GRAPHICAL MODELS, 2023, 127
  • [8] Occlusion-Robust 3D Hand Pose Estimation from a Single RGB Image
    Ishii, Asuka
    Nakano, Gaku
    Inoshita, Tetsuo
    PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
  • [9] A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
    Jiang, Changlong
    Xiao, Yang
    Wu, Cunlin
    Zhang, Mingyang
    Zheng, Jinghong
    Cao, Zhiguo
    Zhou, Joey Tianyi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8846 - 8855
  • [10] Learning to Estimate 3D Hand Pose from Single RGB Images
    Zimmermann, Christian
    Brox, Thomas
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4913 - 4921