MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer

被引：0

作者：

Wan, Xiangan ^{[1
]}

Ju, Jianping ^{[1
]}

Tang, Jianying ^{[1
]}

Lin, Mingyu ^{[1
]}

Rao, Ning ^{[1
]}

Chen, Deng ^{[2
]}

Liu, Tingting ^{[1
]}

Li, Jing ^{[1
]}

Bian, Fan ^{[1
]}

Xiong, Nicholas ^{[1
]}

机构：

[1] Hubei Business Coll, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China

[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robot, Wuhan 430079, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 21期

基金：

中国国家自然科学基金;

关键词：

depth image; 3D hand pose estimation; multi-perspective cues; Swin Transformer; deep learning; REGRESSION; NETWORK;

D O I：

10.3390/s24217029

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.

引用

页数：17

共 50 条

[21] A Multi-Perspective 3D Reconstruction Method with Single Perspective Instantaneous Target Attitude Estimation
Xu, Dan
Xing, Mengdao
Xia, Xiang-Gen
Sun, Guang-Cai
Fu, Jixiang
Su, Tao
REMOTE SENSING, 2019, 11 (11)
[22] A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
Jiang, Changlong
Xiao, Yang
Wu, Cunlin
Zhang, Mingyang
Zheng, Jinghong
Cao, Zhiguo
Zhou, Joey Tianyi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8846 - 8855
[23] Multi-hop Graph Transformer Network for 3D Human Pose Estimation
Islam, Zaedul
Hamza, A. Ben
arXiv,
[24] Multi-hop graph transformer network for 3D human pose estimation
Islam, Zaedul
Ben Hamza, A.
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
[25] Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
Kim, Jeonghwan
Kwon, Hyukmin
Lim, Seong Yong
Kim, Wonjun
MULTIMEDIA SYSTEMS, 2024, 30 (01)
[26] Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
Jeonghwan Kim
Hyukmin Kwon
Seong Yong Lim
Wonjun Kim
Multimedia Systems, 2024, 30
[27] 3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information
Wu, Yiqi
Ma, Shichao
Zhang, Dejun
Sun, Jun
SYMMETRY-BASEL, 2020, 12 (10): : 1 - 14
[28] ESMformer: Error-aware self-supervised transformer for multi-view 3D human pose estimation
Zhang, Lijun
Zhou, Kangkang
Lu, Feng
Li, Zhenghao
Shao, Xiaohu
Zhou, Xiang-Dong
Shi, Yu
PATTERN RECOGNITION, 2025, 158
[29] Multi-pose 3D face recognition based on joint sparse representation
Guo, Zhe
Fan, Yangyu
Lei, Tao
Liu, Shu
Guo, Z., 1600, Northwestern Polytechnical University (32): : 382 - 387
[30] 3D hand pose estimation and reconstruction based on multi-feature fusion
Wang, Jiye
Xiang, Xuezhi
Ding, Shuai
El Saddik, Abdulmotaleb
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101

← 1 2 3 4 5 →