MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引:0
|
作者
Wan, Xiangan [1 ]
Ju, Jianping [1 ]
Tang, Jianying [1 ]
Lin, Mingyu [1 ]
Rao, Ning [1 ]
Chen, Deng [2 ]
Liu, Tingting [1 ]
Li, Jing [1 ]
Bian, Fan [1 ]
Xiong, Nicholas [1 ]
机构
[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;
D O I
10.3390/s24217029
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
    Ohkawa, Takehiko
    He, Kun
    Sener, Fadime
    Hodan, Tomas
    Tran, Luan
    Keskin, Cem
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12999 - 13008
  • [32] Geometry-Aware 3D Hand-Object Pose Estimation Under Occlusion via Hierarchical Feature Decoupling
    Cai, Yuting
    Pan, Huimin
    Yang, Jiayi
    Liu, Yichen
    Gao, Quanli
    Wang, Xihan
    ELECTRONICS, 2025, 14 (05):
  • [33] 3D Hand Pose Estimation Based on Double Branches with Multi Scale Attention
    Ma S.-L.
    Li J.-H.
    Kong D.-H.
    Wang L.-C.
    Wang S.-F.
    Yin B.-C.
    Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (07): : 1383 - 1395
  • [34] Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation
    Wu, Yongpeng
    Kong, Dehui
    Gao, Junna
    Li, Jinghua
    Yin, Baocai
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [35] VTP: volumetric transformer for multi-view multi-person 3D pose estimation
    Chen, Yuxing
    Gu, Renshu
    Huang, Ouhan
    Jia, Gangyong
    APPLIED INTELLIGENCE, 2023, 53 (22) : 26568 - 26579
  • [36] VTP: volumetric transformer for multi-view multi-person 3D pose estimation
    Yuxing Chen
    Renshu Gu
    Ouhan Huang
    Gangyong Jia
    Applied Intelligence, 2023, 53 : 26568 - 26579
  • [37] MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand Pose Synthesis
    Wu, Zhenyu
    Hoang, Duc
    Lin, Shih-Yao
    Xie, Yusheng
    Chen, Liangjian
    Lin, Yen-Yu
    Wang, Zhangyang
    Fan, Wei
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2508 - 2516
  • [38] SEMI-SUPERVISED 3D HAND-OBJECT POSE ESTIMATION VIA POSE DICTIONARY LEARNING
    Cheng, Zida
    Chen, Siheng
    Zhang, Ya
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3632 - 3636
  • [39] Adaptive Multi-View and Temporal Fusing Transformer for 3D Human Pose Estimation
    Shuai, Hui
    Wu, Lele
    Liu, Qingshan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4122 - 4135
  • [40] Multi-scale spatial-temporal transformer for 3D human pose estimation
    Wu, Yongpeng
    Gao, Junna
    2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247