MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引:0
|
作者
Wan, Xiangan [1 ]
Ju, Jianping [1 ]
Tang, Jianying [1 ]
Lin, Mingyu [1 ]
Rao, Ning [1 ]
Chen, Deng [2 ]
Liu, Tingting [1 ]
Li, Jing [1 ]
Bian, Fan [1 ]
Xiong, Nicholas [1 ]
机构
[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;
D O I
10.3390/s24217029
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] A Multi-Perspective 3D Reconstruction Method with Single Perspective Instantaneous Target Attitude Estimation
    Xu, Dan
    Xing, Mengdao
    Xia, Xiang-Gen
    Sun, Guang-Cai
    Fu, Jixiang
    Su, Tao
    REMOTE SENSING, 2019, 11 (11)
  • [22] A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
    Jiang, Changlong
    Xiao, Yang
    Wu, Cunlin
    Zhang, Mingyang
    Zheng, Jinghong
    Cao, Zhiguo
    Zhou, Joey Tianyi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8846 - 8855
  • [23] Multi-hop Graph Transformer Network for 3D Human Pose Estimation
    Islam, Zaedul
    Hamza, A. Ben
    arXiv,
  • [24] Multi-hop graph transformer network for 3D human pose estimation
    Islam, Zaedul
    Ben Hamza, A.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
  • [25] Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
    Kim, Jeonghwan
    Kwon, Hyukmin
    Lim, Seong Yong
    Kim, Wonjun
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [26] Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
    Jeonghwan Kim
    Hyukmin Kwon
    Seong Yong Lim
    Wonjun Kim
    Multimedia Systems, 2024, 30
  • [27] 3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information
    Wu, Yiqi
    Ma, Shichao
    Zhang, Dejun
    Sun, Jun
    SYMMETRY-BASEL, 2020, 12 (10): : 1 - 14
  • [28] ESMformer: Error-aware self-supervised transformer for multi-view 3D human pose estimation
    Zhang, Lijun
    Zhou, Kangkang
    Lu, Feng
    Li, Zhenghao
    Shao, Xiaohu
    Zhou, Xiang-Dong
    Shi, Yu
    PATTERN RECOGNITION, 2025, 158
  • [29] Multi-pose 3D face recognition based on joint sparse representation
    Guo, Zhe
    Fan, Yangyu
    Lei, Tao
    Liu, Shu
    Guo, Z., 1600, Northwestern Polytechnical University (32): : 382 - 387
  • [30] 3D hand pose estimation and reconstruction based on multi-feature fusion
    Wang, Jiye
    Xiang, Xuezhi
    Ding, Shuai
    El Saddik, Abdulmotaleb
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101