MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引:0
|
作者
Wan, Xiangan [1 ]
Ju, Jianping [1 ]
Tang, Jianying [1 ]
Lin, Mingyu [1 ]
Rao, Ning [1 ]
Chen, Deng [2 ]
Liu, Tingting [1 ]
Li, Jing [1 ]
Bian, Fan [1 ]
Xiong, Nicholas [1 ]
机构
[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;
D O I
10.3390/s24217029
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Deep Semantic Graph Transformer for Multi-View 3D Human Pose Estimation
    Zhang, Lijun
    Zhou, Kangkang
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7205 - 7214
  • [42] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
    Zhou, Kangkang
    Zhang, Lijun
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520
  • [43] Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
    Chen, Xipeng
    Lin, Kwan-Yee
    Liu, Wentao
    Qian, Chen
    Lin, Liang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10887 - 10896
  • [44] Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations
    Yoo, Cheol-Hwan
    Ji, Seowon
    Shin, Yong-Goo
    Kim, Seung-Wook
    Ko, Sung-Jea
    IEEE ACCESS, 2020, 8 : 114010 - 114019
  • [45] 3D pose reconstruction with multi-perspective and spatial confidence point group for jump analysis in figure skating
    Tian, L.
    Cheng, X.
    Honda, M.
    Ikenaga, T.
    FIFTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2020, 11526
  • [46] Image-free Domain Generalization via CLIP for 3D Hand Pose Estimation
    Lee, Seongyeong
    Park, Hansoo
    Kim, Dong Uk
    Kim, Jihyeon
    Boboev, Muhammadjon
    Baek, Seungryul
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2933 - 2943
  • [47] 3D Hand Pose Estimation via aligned latent space injection and kinematic losses
    Stergioulas, Andreas
    Chatzis, Theocharis
    Konstantinidis, Dimitrios
    Dimitropoulos, Kosmas
    Daras, Petros
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1730 - 1739
  • [48] 3D Hand Pose Estimation with a Single Infrared Camera via Domain Transfer Learning
    Park, Gabyong
    Kim, Tae-Kyun
    Woo, Woontack
    2020 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2020), 2020, : 588 - 599
  • [49] LAMP: 3D layered, adaptive-resolution, and multi-perspective panorama - a new scene representation
    Zhu, ZG
    Hanson, AR
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (03) : 294 - 326
  • [50] HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation
    Huang, Lin
    Tan, Jianchao
    Meng, Jingjing
    Liu, Ji
    Yuan, Junsong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3136 - 3145