Dual-Path Transformer for 3D Human Pose Estimation

被引：6

作者：

Zhou, Lu ^{[1
]}

Chen, Yingying ^{[1
]}

Wang, Jinqiao ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Fdn Model Res Ctr, Beijing 100190, Peoples R China

[2] Wuhan AI Res, Wuhan 430073, Peoples R China

[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 05期

关键词：

Transformers; Three-dimensional displays; Pose estimation; Task analysis; Solid modeling; Feature extraction; Benchmark testing; 3D human pose estimation; transformer; motion; distillation;

D O I：

10.1109/TCSVT.2023.3318557

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Video-based 3D human pose estimation has achieved great progress, however, it is still difficult to learn precise 2D-3D projection under some hard cases. Multi-level human knowledge and motion information serve as two key elements in the field to conquer the challenges caused by various factors, where the former encodes various human structure information spatially and the latter captures the motion change temporally. Inspired by this, we propose a DualFormer (dual-path transformer) network which encodes multiple human contexts and motion detail to perform the spatial-temporal modeling. Firstly, motion information which depicts the movement change of human body is embedded to provide explicit motion prior for the transformer module. Secondly, a dual-path transformer framework is proposed to model long-range dependencies of both joint sequence and limb sequence. Parallel context embedding is performed initially and a cross transformer block is then appended to promote the interaction of the dual paths which improves the feature robustness greatly. Specifically, predictions of multiple levels can be acquired simultaneously. Lastly, we employ the weighted distillation technique to accelerate the convergence of the dual-path framework. We conduct extensive experiments on three different benchmarks, i.e., Human 3.6M, MPI-INF-3DHP and HumanEva-I. We mainly compute the MPJPE, P-MPJPE, PCK and AUC to evaluate the effectiveness of proposed approach and our work achieves competitive results compared with state-of-the-art approaches. Specifically, the MPJPE is reduced to 42.8mm which is 1.5mm lower than PoseFormer on Human3.6M, which proves the efficacy of the proposed approach.

引用

页码：3260 / 3270

页数：11

共 50 条

[31] SlowFastFormer for 3D human pose estimation
Zhou, Lu
Chen, Yingying
Wang, Jinqiao
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 243
[32] 3D Human Pose Estimation=2D Pose Estimation plus Matching
Chen, Ching-Hang
Ramanan, Deva
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5759 - 5767
[33] HOGFormer: high-order graph convolution transformer for 3D human pose estimation
Xie, Yuhong
Hong, Chaoqun
Zhuang, Weiwei
Liu, Lijuan
Li, Jie
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (01) : 599 - 610
[34] Adaptive Multi-View and Temporal Fusing Transformer for 3D Human Pose Estimation
Shuai, Hui
Wu, Lele
Liu, Qingshan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4122 - 4135
[35] U-shaped spatial–temporal transformer network for 3D human pose estimation
Honghong Yang
Longfei Guo
Yumei Zhang
Xiaojun Wu
Machine Vision and Applications, 2022, 33
[36] Dual-Path Transformer For Machine Condition Monitoring
Bai, Jisheng
Wang, Mou
Chen, Jianfeng
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1144 - 1148
[37] Multi-scale spatial-temporal transformer for 3D human pose estimation
Wu, Yongpeng
Gao, Junna
2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247
[38] Streaming Dual-Path Transformer for Speech Enhancement
Bae, Soo Hyun
Chae, Seok Wan
Kim, Youngseok
Lee, Keunsang
Lim, Hyunjin
Kim, Lae-Hoon
INTERSPEECH 2023, 2023, : 824 - 828
[39] ICRFormer: An Improving Cos-Reweighting Transformer for 3D Human Pose Estimation in Video
Zhang, Kaixu
Luan, Xiaoming
Syed, Tafseer Haider Shah
Xiang, Xuezhi
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 436 - 441
[40] Deep Semantic Graph Transformer for Multi-View 3D Human Pose Estimation
Zhang, Lijun
Zhou, Kangkang
Lu, Feng
Zhou, Xiang-Dong
Shi, Yu
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7205 - 7214

← 1 2 3 4 5 →