A multi-granular joint tracing transformer for video-based 3D human pose estimation

被引:0
|
作者
Yingying Hou [1 ]
Zhenhua Huang [1 ]
Wentao Zhu [2 ]
机构
[1] Anhui University,
[2] Amazon Research,undefined
关键词
3D human pose estimation; Joint-tracing transformer; Temporal dependencies; Spatial relationship;
D O I
10.1007/s11760-024-03589-0
中图分类号
学科分类号
摘要
Human pose estimation from monocular images captured by motion capture cameras is a crucial task with a wide range of downstream applications, e.g., action recognition, motion transfer, and movie making. However, previous methods have not effectively addressed the depth blur problem while considering the temporal correlation of individual and multiple body joints together. We address the issue by simultaneously exploiting the temporal information at both single-joint and multiple-joint granularities. Inspired by the observation that different body joints have different moving trajectories and can be correlated with others, we proposed an approach called the multi-granularity joint tracing transformer (MOTT). MOTT consists of two main components: (1) a spatial transformer that encodes each frame to obtain spatial embeddings of all joints, and (2) a multi-granularity temporal transformer that includes both a holistic temporal transformer to handle the temporal correlation between all joints in consecutive frames and a joint tracing temporal transformer to process the temporal embedding of each particular joint. The outputs of the two branches are fused to produce accurate 3D human poses. Extensive experiments on Human3.6M and MPI-INF-3DHP datasets demonstrate that MOTT effectively encodes the spatial and temporal dependencies between body joints and outperforms previous methods in terms of mean per joint position error.
引用
下载
收藏
相关论文
共 50 条
  • [1] Video-Based 3D Human Pose Estimation Research
    Tao, Siting
    Zhang, Zhi
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 485 - 490
  • [2] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [3] Video-Based 3D pose estimation for residential roofing
    Wang, Ruochen
    Zheng, Liying
    Hawke, Ashley L.
    Carey, Robert E.
    Breloff, Scott P.
    Li, Kang
    Peng, Xi
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (03): : 369 - 377
  • [4] Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation
    Gai, Di
    Feng, Runyang
    Min, Weidong
    Yang, Xiaosong
    Su, Pengxiang
    Wang, Qi
    Han, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4564 - 4576
  • [5] Video-based body geometric aware network for 3D human pose estimation
    Chaonan Li
    Sheng Liu
    Lu Yao
    Siyu Zou
    Optoelectronics Letters, 2022, 18 : 313 - 320
  • [6] Video-based body geometric aware network for 3D human pose estimation
    LI Chaonan
    LIU Sheng
    YAO Lu
    ZOU Siyu
    Optoelectronics Letters, 2022, (05) : 313 - 320
  • [7] 3D Human Pose Estimation in Video with Temporal and Spatial Transformer
    Peng, Sha
    Hu, Jiwei
    Proceedings of SPIE - The International Society for Optical Engineering, 2023, 12707
  • [8] Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
    Shen, Xiaolong
    Yang, Zongxin
    Wang, Xiaohan
    Ma, Jianxin
    Zhou, Chang
    Yang, Yi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8887 - 8896
  • [9] Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video
    Sun, Cheng
    Thomas, Diego
    Kawasaki, Hiroshi
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5959 - 5964
  • [10] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    MULTIMEDIA SYSTEMS, 2024, 30 (06)