Dual-Path Transformer for 3D Human Pose Estimation

被引：6

作者：

Zhou, Lu ^{[1
]}

Chen, Yingying ^{[1
]}

Wang, Jinqiao ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Fdn Model Res Ctr, Beijing 100190, Peoples R China

[2] Wuhan AI Res, Wuhan 430073, Peoples R China

[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 05期

关键词：

Transformers; Three-dimensional displays; Pose estimation; Task analysis; Solid modeling; Feature extraction; Benchmark testing; 3D human pose estimation; transformer; motion; distillation;

D O I：

10.1109/TCSVT.2023.3318557

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Video-based 3D human pose estimation has achieved great progress, however, it is still difficult to learn precise 2D-3D projection under some hard cases. Multi-level human knowledge and motion information serve as two key elements in the field to conquer the challenges caused by various factors, where the former encodes various human structure information spatially and the latter captures the motion change temporally. Inspired by this, we propose a DualFormer (dual-path transformer) network which encodes multiple human contexts and motion detail to perform the spatial-temporal modeling. Firstly, motion information which depicts the movement change of human body is embedded to provide explicit motion prior for the transformer module. Secondly, a dual-path transformer framework is proposed to model long-range dependencies of both joint sequence and limb sequence. Parallel context embedding is performed initially and a cross transformer block is then appended to promote the interaction of the dual paths which improves the feature robustness greatly. Specifically, predictions of multiple levels can be acquired simultaneously. Lastly, we employ the weighted distillation technique to accelerate the convergence of the dual-path framework. We conduct extensive experiments on three different benchmarks, i.e., Human 3.6M, MPI-INF-3DHP and HumanEva-I. We mainly compute the MPJPE, P-MPJPE, PCK and AUC to evaluate the effectiveness of proposed approach and our work achieves competitive results compared with state-of-the-art approaches. Specifically, the MPJPE is reduced to 42.8mm which is 1.5mm lower than PoseFormer on Human3.6M, which proves the efficacy of the proposed approach.

引用

页码：3260 / 3270

页数：11

共 50 条

[41] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
Zhou, Kangkang
Zhang, Lijun
Lu, Feng
Zhou, Xiang-Dong
Shi, Yu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520
[42] GraFormer: Graph-oriented Transformer for 3D Pose Estimation
Zhao, Weixi
Wang, Weiqiang
Tian, Yunjie
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20406 - 20415
[43] Temporally Consistent 3D Human Pose Estimation Using Dual 360° Cameras
Shere, Matthew
Kim, Hansung
Hilton, Adrian
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 81 - 90
[44] 3d human pose estimation based on conditional dual-branch diffusion
Li, Jinghua
Bai, Zhuowei
Kong, Dehui
Chen, Dongpan
Li, Qianxing
Yin, Baocai
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[45] A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Peng, Qucheng
Zheng, Ce
Chen, Chen
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2240 - 2249
[46] DHRNet: A Dual-path Hierarchical Relation Network for multi-person pose estimation
Dang, Yonghao
Yin, Jianqin
Liu, Liyuan
Ding, Pengxiang
Sun, Yuan
Hu, Yanzhu
KNOWLEDGE-BASED SYSTEMS, 2024, 300
[47] Occlusion Resilient 3D Human Pose Estimation
Roy, Soumava Kumar
Badanin, Ilia
Honari, Sina
Fua, Pascal
2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1198 - 1207
[48] A survey on monocular 3D human pose estimation
Ji X.
Fang Q.
Dong J.
Shuai Q.
Jiang W.
Zhou X.
Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500
[49] Precise 3D Pose Estimation of Human Faces
Pernek, Akos
Hajder, Levente
PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 3, 2014, : 618 - 625
[50] A survey on deep 3D human pose estimation
Neupane, Rama Bastola
Li, Kan
Boka, Tesfaye Fenta
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (01)

← 1 2 3 4 5 →