VTP: volumetric transformer for multi-view multi-person 3D pose estimation

被引:0
|
作者
Yuxing Chen
Renshu Gu
Ouhan Huang
Gangyong Jia
机构
[1] Hangzhou Dianzi University,The School of Computer Science and Technology
[2] Fudan University,Key Laboratory for Information Science of Electromagnetic Waves (MoE)
来源
Applied Intelligence | 2023年 / 53卷
关键词
3D human pose estimation; Sinkhorn transformer; Multi-person pose estimation; Volumetric representation; Multi-view pose estimation; Sparse sinkhorn attention;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents Volumetric Transformer Pose Estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.
引用
收藏
页码:26568 / 26579
页数:11
相关论文
共 50 条
  • [21] Adaptive Multi-View and Temporal Fusing Transformer for 3D Human Pose Estimation
    Shuai, Hui
    Wu, Lele
    Liu, Qingshan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4122 - 4135
  • [22] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
    Zhou, Kangkang
    Zhang, Lijun
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520
  • [23] Deep Semantic Graph Transformer for Multi-View 3D Human Pose Estimation
    Zhang, Lijun
    Zhou, Kangkang
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7205 - 7214
  • [24] Multi-person 3D Pose Estimation from Monocular Image Sequences
    Li, Ran
    Xu, Nayun
    Lu, Xutong
    Xing, Yucheng
    Zhao, Haohua
    Niu, Li
    Zhang, Liqing
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 15 - 24
  • [25] Efficient Multi-Person Hierarchical 3D Pose Estimation for Autonomous Driving
    Gu, Renshu
    Wang, Gaoang
    Hwang, Jenq-Neng
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 163 - 168
  • [26] Multi-Person 3D Human Pose Estimation from Monocular Images
    Dabral, Rishabh
    Gundavarapu, Nitesh B.
    Mitra, Rahul
    Sharma, Abhishek
    Ramakrishnan, Ganesh
    Jain, Arjun
    2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 405 - 414
  • [27] Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation
    Zhang, Juze
    Wang, Jingya
    Shi, Ye
    Gao, Fei
    Xu, Lan
    Yu, Jingyi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1788 - 1796
  • [28] VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild
    Zhang, Yifu
    Wang, Chunyu
    Wang, Xinggang
    Liu, Wenyu
    Zeng, Wenjun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2613 - 2626
  • [29] Multi-Person Absolute 3D Pose and Shape Estimation from Video
    Zhang, Kaifu
    Li, Yihui
    Guan, Yisheng
    Xi, Ning
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2021, PT III, 2021, 13015 : 189 - 200
  • [30] Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
    Liu, Qihao
    Zhang, Yi
    Bai, Song
    Yuille, Alan
    COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 497 - 517