MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引:0
|
作者
Wan, Xiangan [1 ]
Ju, Jianping [1 ]
Tang, Jianying [1 ]
Lin, Mingyu [1 ]
Rao, Ning [1 ]
Chen, Deng [2 ]
Liu, Tingting [1 ]
Li, Jing [1 ]
Bian, Fan [1 ]
Xiong, Nicholas [1 ]
机构
[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;
D O I
10.3390/s24217029
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] 3D hand pose and mesh estimation via a generic Topology-aware Transformer model
    Yu, Shaoqi
    Wang, Yintong
    Chen, Lili
    Zhang, Xiaolin
    Li, Jiamao
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [2] HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation
    Wang, Yintong
    Chen, LiLi
    Li, Jiamao
    Zhang, Xiaolin
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5664 - 5673
  • [3] Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition
    Gao, Z.
    Zhang, H.
    Xu, G. P.
    Xue, Y. B.
    NEUROCOMPUTING, 2015, 151 : 554 - 564
  • [4] Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer
    Liu, Hai
    Zhang, Cheng
    Deng, Yongjian
    Liu, Tingting
    Zhang, Zhaoli
    Li, You-Fu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6289 - 6302
  • [5] CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting
    Guo, Shaoxiang
    Cai, Qing
    Qi, Lin
    Dong, Junyu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4896 - 4907
  • [6] Generic 3D Representation via Pose Estimation and Matching
    Zamir, Amir R.
    Wekel, Tilman
    Agrawal, Pulkit
    Wei, Colin
    Malik, Jitendra
    Savarese, Silvio
    COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 535 - 553
  • [7] Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation
    Khaleghi, Leyla
    Marshall, Joshua
    Etemad, Ali
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 535 - 541
  • [8] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
    Cheng, Wencan
    Kim, Eunji
    Ko, Jong Hwan
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
  • [9] Multi-hypothesis representation learning for transformer-based 3D human pose estimation
    Li, Wenhao
    Liu, Hong
    Tang, Hao
    Wang, Pichao
    PATTERN RECOGNITION, 2023, 141
  • [10] A Joint Relationship Aware Neural Network for Single-Image 3D Human Pose Estimation
    Zheng, Xiangtao
    Chen, Xiumei
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4747 - 4758