EMHIFormer: An Enhanced Multi-Hypothesis Interaction Transformer for 3D human estimation in video✩

被引:2
|
作者
Xiang, Xuezhi [1 ,2 ]
Zhang, Kaixu [1 ]
Qiao, Yulong [1 ,2 ]
El Saddik, Abdulmotaleb [3 ]
机构
[1] Harbin Engn Univ, Sch Informat & Commun Engn, Harbin 150001, Peoples R China
[2] Minist Ind & Informat Technol, Key Lab Adv Marine Commun & Informat Technol, Harbin 150001, Peoples R China
[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada
基金
黑龙江省自然科学基金; 中国国家自然科学基金;
关键词
3D human pose estimation; Transformer; Cross-hypothesis; Enhanced regression head;
D O I
10.1016/j.jvcir.2023.103890
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Monocular 3D human pose estimation is a challenging task because of depth ambiguity and occlusion. Recent methods exploit spatio-temporal information and generate different hypotheses for simulating diverse solutions to alleviate these problems. However, these methods do not fully extract spatial and temporal information and the relationship of each hypothesis. To ease these limitations, we propose EMHIFormer (Enhanced Multi-Hypothesis Interaction Transformer) to model 3D human pose with better performance. In detail, we build connections between different Transformer layers so that our model is able to integrate spatio-temporal information from the previous layer and establish more comprehensive hypotheses. Furthermore, a cross-hypothesis model consisting of a parallel Transformer is proposed to strengthen the relationship between various hypotheses. We also design an enhanced regression head which adaptively adjusts the channel weights to export the final 3D human pose. Extensive experiments are conducted on two challenging datasets: Human3.6M and MPI-INF-3DHP to evaluate our EMHIFormer. The results show that EMHIFormer achieves competitive performance on Human3.6M and state-of-the-art performance on MPI-INF-3DHP. Compared with the closest counterpart, MHFormer, our model outperforms it by 0.6% P-MPJPE and 0.5% MPJPE on Human3.6M dataset and 46.0% MPJPE on MPI-INF-3DHP.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Adaptive Multi-View and Temporal Fusing Transformer for 3D Human Pose Estimation
    Shuai, Hui
    Wu, Lele
    Liu, Qingshan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4122 - 4135
  • [22] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
    Zhou, Kangkang
    Zhang, Lijun
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520
  • [23] InterTrack: Interaction Transformer for 3D Multi-Object Tracking
    Willes, John
    Reading, Cody
    Waslander, Steven L.
    2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 73 - 80
  • [24] Dual-Path Transformer for 3D Human Pose Estimation
    Zhou, Lu
    Chen, Yingying
    Wang, Jinqiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3260 - 3270
  • [25] DGFormer: Dynamic graph transformer for 3D human pose estimation
    Chen Z.
    Dai J.
    Bai J.
    Pan J.
    Pattern Recognition, 2024, 152
  • [26] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536
  • [27] Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet
    Zou, Shihao
    Xu, Yuanlu
    Li, Chao
    Ma, Lingni
    Cheng, Li
    Vo, Minh
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4921 - 4933
  • [28] 3D MOTION ESTIMATION FOR 3D VIDEO CODING
    Paul, Manoranjan
    Gao, Junbin
    Anotolovich, Michael
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1189 - 1192
  • [29] Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec
    Yang C.
    Lü Z.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (10): : 51 - 61
  • [30] Real-time reliability measure-driven multi-hypothesis tracking using 2D and 3D features
    Marcos D Zúñiga
    François Brémond
    Monique Thonnat
    EURASIP Journal on Advances in Signal Processing, 2011