DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video

被引：0

作者：

Xiang, Xuezhi ^{[1
,2
]}

Li, Xiaoheng ^{[1
]}

Bao, Weijie ^{[1
]}

Qiaoa, Yulong ^{[1
,3
]}

El Saddik, Abdulmotaleb ^{[3
]}

机构：

[1] Harbin Engn Univ, Sch Informat & Commun Engn, Harbin 150001, Peoples R China

[2] Minist Ind & Informat Technol, Key Lab Adv Marine Commun & Informat Technol, Harbin 150001, Peoples R China

[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 249卷

基金：

黑龙江省自然科学基金; 中国国家自然科学基金;

关键词：

3D human pose estimation; Transformer; Dual-branch; Cross-hypothesis;

D O I：

10.1016/j.cviu.2024.104147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The estimation of 3D human poses from monocular videos presents a significant challenge. The existing methods face the problems of deep ambiguity and self-occlusion. To overcome these problems, we propose a Double-Branch Multi-Hypothesis Transformer (DBMHT). In detail, we utilize a Double-Branch architecture to capture temporal and spatial information and generate multiple hypotheses. To merge these hypotheses, we adopt a lightweight module to integrate spatial and temporal representations. The DBMHT can not only capture spatial information from each joint in the human body and temporal information from each frame in the video but also merge multiple hypotheses that have different spatio-temporal information. Comprehensive evaluation on two challenging datasets (i.e. Human3.6M and MPI-INF-3DHP) demonstrates the superior performance of DBMHT, marking it as a robust and efficient approach for accurate 3D HPE in dynamic scenarios. The results show that our model surpasses the state-of-the-art approach by 1.9% MPJPE with ground truth 2D keypoints as input.

引用

页数：8

共 50 条

[21] Dual-Path Transformer for 3D Human Pose Estimation
Zhou, Lu
Chen, Yingying
Wang, Jinqiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3260 - 3270
[22] DGFormer: Dynamic graph transformer for 3D human pose estimation
Chen Z.
Dai J.
Bai J.
Pan J.
Pattern Recognition, 2024, 152
[23] End-to-end 3D Human Pose Estimation with Transformer
Zhang, Bowei
Cui, Peng
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536
[24] Video-Based 3D Human Pose Estimation Research
Tao, Siting
Zhang, Zhi
2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 485 - 490
[25] Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation
Tien-Dat Tran
Xuan-Thuy Vo
Duy-Linh Nguyen
Jo, Kang-Hyun
2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 174 - 178
[26] Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation
Li, Wenhao
Liu, Hong
Ding, Runwei
Liu, Mengyuan
Wang, Pichao
Yang, Wenming
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1282 - 1293
[27] Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet
Zou, Shihao
Xu, Yuanlu
Li, Chao
Ma, Lingni
Cheng, Li
Vo, Minh
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4921 - 4933
[28] Double chain networks for monocular 3D human pose estimation
Bai, Guihu
Luo, Yanmin
Pan, Xueliang
Wang, Youjie
Wang, Jia
Guo, Jingming
IMAGE AND VISION COMPUTING, 2022, 123
[29] Joint Camera Pose Estimation and 3D Human Pose Estimation in a Multi-camera Setup
Puwein, Jens
Ballan, Luca
Ziegler, Remo
Pollefeys, Marc
COMPUTER VISION - ACCV 2014, PT II, 2015, 9004 : 473 - 487
[30] Occlusion-Aware Networks for 3D Human Pose Estimation in Video
Cheng, Yu
Yang, Bo
Wang, Bo
Yan, Wending
Tan, Robby T.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 723 - 732

← 1 2 3 4 5 →