EMHIFormer: An Enhanced Multi-Hypothesis Interaction Transformer for 3D human estimation in video✩

被引:2
|
作者
Xiang, Xuezhi [1 ,2 ]
Zhang, Kaixu [1 ]
Qiao, Yulong [1 ,2 ]
El Saddik, Abdulmotaleb [3 ]
机构
[1] Harbin Engn Univ, Sch Informat & Commun Engn, Harbin 150001, Peoples R China
[2] Minist Ind & Informat Technol, Key Lab Adv Marine Commun & Informat Technol, Harbin 150001, Peoples R China
[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada
基金
黑龙江省自然科学基金; 中国国家自然科学基金;
关键词
3D human pose estimation; Transformer; Cross-hypothesis; Enhanced regression head;
D O I
10.1016/j.jvcir.2023.103890
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Monocular 3D human pose estimation is a challenging task because of depth ambiguity and occlusion. Recent methods exploit spatio-temporal information and generate different hypotheses for simulating diverse solutions to alleviate these problems. However, these methods do not fully extract spatial and temporal information and the relationship of each hypothesis. To ease these limitations, we propose EMHIFormer (Enhanced Multi-Hypothesis Interaction Transformer) to model 3D human pose with better performance. In detail, we build connections between different Transformer layers so that our model is able to integrate spatio-temporal information from the previous layer and establish more comprehensive hypotheses. Furthermore, a cross-hypothesis model consisting of a parallel Transformer is proposed to strengthen the relationship between various hypotheses. We also design an enhanced regression head which adaptively adjusts the channel weights to export the final 3D human pose. Extensive experiments are conducted on two challenging datasets: Human3.6M and MPI-INF-3DHP to evaluate our EMHIFormer. The results show that EMHIFormer achieves competitive performance on Human3.6M and state-of-the-art performance on MPI-INF-3DHP. Compared with the closest counterpart, MHFormer, our model outperforms it by 0.6% P-MPJPE and 0.5% MPJPE on Human3.6M dataset and 46.0% MPJPE on MPI-INF-3DHP.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow
    Li, Muyu
    Hu, Henan
    Xiong, Jingjing
    Zhao, Xudong
    Yan, Hong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [42] Self-supervised 3D human pose estimation from video
    Gholami, Mohsen
    Rezaei, Ahmad
    Rhodin, Helge
    Ward, Rabab
    Wang, Z. Jane
    NEUROCOMPUTING, 2022, 488 : 97 - 106
  • [43] Occlusion-Aware Networks for 3D Human Pose Estimation in Video
    Cheng, Yu
    Yang, Bo
    Wang, Bo
    Yan, Wending
    Tan, Robby T.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 723 - 732
  • [44] Enhanced 3D Residual Network for Human Fall Detection in Video Surveillance
    Li, Suyuan
    Song, Xin
    Cao, Jing
    Xu, Siyang
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (12): : 3991 - 4007
  • [45] IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation
    Qiu, Zhongwei
    Yang, Qiansheng
    Wang, Jian
    Fu, Dongmei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 6174 - 6182
  • [46] Split-and-recombine and vision transformer based 3D human pose estimation
    Lu, Xinyi
    Xu, Fan
    Hu, Shuiyi
    Yu, Tianqi
    Hu, Jianling
    Signal, Image and Video Processing, 2025, 19 (01)
  • [47] SCGFormer: Semantic Chebyshev Graph Convolution Transformer for 3D Human Pose Estimation
    Liang, Jiayao
    Yin, Mengxiao
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [48] GRAPHRPE: RELATIVE POSITION ENCODING GRAPH TRANSFORMER FOR 3D HUMAN POSE ESTIMATION
    Zou, Junjie
    Shao, Ming
    Xia, Siyu
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 895 - 899
  • [49] Vertex position estimation with spatial-temporal transformer for 3D human reconstruction
    Zhang, Xiangjun
    Zheng, Yinglin
    Deng, Wenjin
    Dai, Qifeng
    Lin, Yuxin
    Shi, Wangzheng
    Zeng, Ming
    GRAPHICAL MODELS, 2023, 130
  • [50] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
    Liu, Xing
    Tang, Hao
    IMAGE AND VISION COMPUTING, 2023, 140