Dual-Path Transformer for 3D Human Pose Estimation

被引:6
|
作者
Zhou, Lu [1 ]
Chen, Yingying [1 ]
Wang, Jinqiao [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Fdn Model Res Ctr, Beijing 100190, Peoples R China
[2] Wuhan AI Res, Wuhan 430073, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
关键词
Transformers; Three-dimensional displays; Pose estimation; Task analysis; Solid modeling; Feature extraction; Benchmark testing; 3D human pose estimation; transformer; motion; distillation;
D O I
10.1109/TCSVT.2023.3318557
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video-based 3D human pose estimation has achieved great progress, however, it is still difficult to learn precise 2D-3D projection under some hard cases. Multi-level human knowledge and motion information serve as two key elements in the field to conquer the challenges caused by various factors, where the former encodes various human structure information spatially and the latter captures the motion change temporally. Inspired by this, we propose a DualFormer (dual-path transformer) network which encodes multiple human contexts and motion detail to perform the spatial-temporal modeling. Firstly, motion information which depicts the movement change of human body is embedded to provide explicit motion prior for the transformer module. Secondly, a dual-path transformer framework is proposed to model long-range dependencies of both joint sequence and limb sequence. Parallel context embedding is performed initially and a cross transformer block is then appended to promote the interaction of the dual paths which improves the feature robustness greatly. Specifically, predictions of multiple levels can be acquired simultaneously. Lastly, we employ the weighted distillation technique to accelerate the convergence of the dual-path framework. We conduct extensive experiments on three different benchmarks, i.e., Human 3.6M, MPI-INF-3DHP and HumanEva-I. We mainly compute the MPJPE, P-MPJPE, PCK and AUC to evaluate the effectiveness of proposed approach and our work achieves competitive results compared with state-of-the-art approaches. Specifically, the MPJPE is reduced to 42.8mm which is 1.5mm lower than PoseFormer on Human3.6M, which proves the efficacy of the proposed approach.
引用
收藏
页码:3260 / 3270
页数:11
相关论文
共 50 条
  • [1] Enhancing 3D Human Pose Estimation Amidst Severe Occlusion With Dual Transformer Fusion
    Ghafoor, Mehwish
    Mahmood, Arif
    Bilal, Muhammad
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1617 - 1624
  • [2] DGFormer: Dynamic graph transformer for 3D human pose estimation
    Chen, Zhangmeng
    Dai, Ju
    Bai, Junxuan
    Pan, Junjun
    PATTERN RECOGNITION, 2024, 152
  • [3] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536
  • [4] 3D Human Pose Estimation in Video with Temporal and Spatial Transformer
    Peng, Sha
    Hu, Jiwei
    Proceedings of SPIE - The International Society for Optical Engineering, 2023, 12707
  • [5] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
    Zhang, Yunpeng
    Zhu, Zheng
    Du, Dalong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9399 - 9409
  • [6] Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation
    Tien-Dat Tran
    Xuan-Thuy Vo
    Duy-Linh Nguyen
    Jo, Kang-Hyun
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 174 - 178
  • [7] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [8] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [9] Transformer-based weakly supervised 3D human pose estimation
    Wu, Xiao-guang
    Xie, Hu-jie
    Niu, Xiao-chen
    Wang, Chen
    Wang, Ze-lei
    Zhang, Shi-wen
    Shan, Yu-ze
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 109
  • [10] Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation
    Li, Wenhao
    Liu, Hong
    Ding, Runwei
    Liu, Mengyuan
    Wang, Pichao
    Yang, Wenming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1282 - 1293