Multi-hypothesis representation learning for transformer-based 3D human pose estimation

被引:7
|
作者
Li, Wenhao [1 ]
Liu, Hong [1 ]
Tang, Hao [2 ]
Wang, Pichao [3 ,4 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Beijing, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, Zurich, Switzerland
[3] Amazon Prime Video, Seattle, WA USA
[4] Alibaba Grp, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
3D Human pose estimation; Transformer; Multi-Hypothesis; Self-Hypothesis; Cross-Hypothesis;
D O I
10.1016/j.patcog.2023.109631
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite significant progress, estimating 3D human poses from monocular videos remains a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by ex-ploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, we introduce a one-to-many-to-one three-stage framework: (i) Generate mul-tiple initial hypothesis representations; (ii) Model self-hypothesis communication, merge multiple hy-potheses into a single converged representation and then partition it into several diverged hypotheses; (iii) Learn cross-hypothesis communication and aggregate the multi-hypothesis features to synthesize the final 3D pose. Through the above processes, the final representation is enhanced and the synthesized pose is much more accurate. Extensive experiments show that the proposed method achieves state-of -the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. The code and models are available at https://github.com/Vegetebird/MHFormer .(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
    Li, Wenhao
    Liu, Hong
    Tang, Hao
    Wang, Pichao
    Van Gool, Luc
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13137 - 13146
  • [2] Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation
    Shan, Wenkang
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Zhao
    Han, Kai
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14715 - 14725
  • [3] DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video
    Xiang, Xuezhi
    Li, Xiaoheng
    Bao, Weijie
    Qiaoa, Yulong
    El Saddik, Abdulmotaleb
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [4] PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
    Liu, Hanbing
    He, Jun-Yan
    Cheng, Zhi-Qi
    Xiang, Wangmeng
    Yang, Qize
    Chai, Wenhao
    Wang, Gaoang
    Bao, Xu
    Luo, Bin
    Geng, Yifeng
    Xie, Xuansong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5542 - 5551
  • [5] EMHIFormer: An Enhanced Multi-Hypothesis Interaction Transformer for 3D human estimation in video✩
    Xiang, Xuezhi
    Zhang, Kaixu
    Qiao, Yulong
    El Saddik, Abdulmotaleb
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [6] Transformer-based 3D Human pose estimation and action achievement evaluation
    Yang, Aolei
    Zhou, Yinghong
    Yang, Banghua
    Xu, Yulin
    [J]. Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2024, 45 (04): : 136 - 144
  • [7] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [8] DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion Models
    Holmquist, Karl
    Wandt, Bastian
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15931 - 15941
  • [9] Transformer-based rapid human pose estimation network
    Wang, Dong
    Xie, Wenjun
    Cai, Youcheng
    Li, Xinjie
    Liu, Xiaoping
    [J]. COMPUTERS & GRAPHICS-UK, 2023, 116 : 317 - 326
  • [10] MHCanonNet: Multi-Hypothesis Canonical lifting Network for 3D human estimation in the wild video
    Kim, Hyun-Woo
    Lee, Gun-Hee
    Nam, Woo-Jeoung
    Jin, Kyung-Min
    Kang, Tae-Kyung
    Yang, Geon-Jun
    Lee, Seong-Whan
    [J]. PATTERN RECOGNITION, 2024, 145