VTP: volumetric transformer for multi-view multi-person 3D pose estimation

被引:0
|
作者
Yuxing Chen
Renshu Gu
Ouhan Huang
Gangyong Jia
机构
[1] Hangzhou Dianzi University,The School of Computer Science and Technology
[2] Fudan University,Key Laboratory for Information Science of Electromagnetic Waves (MoE)
来源
Applied Intelligence | 2023年 / 53卷
关键词
3D human pose estimation; Sinkhorn transformer; Multi-person pose estimation; Volumetric representation; Multi-view pose estimation; Sparse sinkhorn attention;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents Volumetric Transformer Pose Estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.
引用
收藏
页码:26568 / 26579
页数:11
相关论文
共 50 条
  • [1] VTP: volumetric transformer for multi-view multi-person 3D pose estimation
    Chen, Yuxing
    Gu, Renshu
    Huang, Ouhan
    Jia, Gangyong
    APPLIED INTELLIGENCE, 2023, 53 (22) : 26568 - 26579
  • [2] Direct Multi-view Multi-person 3D Pose Estimation
    Wang, Tao
    Zhang, Jianfeng
    Cai, Yujun
    Yan, Shuicheng
    Feng, Jiashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo
    Lin, Jiahao
    Lee, Gim Hee
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11881 - 11890
  • [4] RF-based Multi-view Pose Machine for Multi-Person 3D Pose Estimation
    Xie, Chunyang
    Zhang, Dongheng
    Wu, Zhi
    Yu, Cong
    Hu, Yang
    Sun, Qibin
    Chen, Yan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2669 - 2674
  • [5] Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error
    de Franca Silva, Diogenes Wallis
    Do Monte Lima, Joao Paulo Silva
    Macedo, David
    Zanchettin, Cleber
    Thomas, Diego Gabriel Francis
    Uchiyama, Hideaki
    Teichrieb, Veronica
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 482 - 494
  • [7] Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
    Fabbri, Matteo
    Lanzi, Fabio
    Calderara, Simone
    Alletto, Stefano
    Cucchiara, Rita
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7202 - 7211
  • [8] Skeleton Cluster Tracking for robust multi-view multi-person 3D human pose estimation
    Niu, Zehai
    Lu, Ke
    Xue, Jian
    Wang, Jinbao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
  • [9] Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
    Wu, Size
    Jin, Sheng
    Liu, Wentao
    Bai, Lei
    Qian, Chen
    Liu, Dong
    Ouyang, Wanli
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11128 - 11137
  • [10] ER-Net: Efficient Recalibration Network for Multi-View Multi-Person 3D Pose Estimation
    Zhou, Mi
    Liu, Rui
    Yi, Pengfei
    Zhou, Dongsheng
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 136 (02): : 2093 - 2109