VTP: volumetric transformer for multi-view multi-person 3D pose estimation

被引:0
|
作者
Yuxing Chen
Renshu Gu
Ouhan Huang
Gangyong Jia
机构
[1] Hangzhou Dianzi University,The School of Computer Science and Technology
[2] Fudan University,Key Laboratory for Information Science of Electromagnetic Waves (MoE)
来源
Applied Intelligence | 2023年 / 53卷
关键词
3D human pose estimation; Sinkhorn transformer; Multi-person pose estimation; Volumetric representation; Multi-view pose estimation; Sparse sinkhorn attention;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents Volumetric Transformer Pose Estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.
引用
收藏
页码:26568 / 26579
页数:11
相关论文
共 50 条
  • [31] Center point to pose: Multiple views 3D human pose estimation for multi-person
    Liu, Huan
    Wu, Jian
    He, Rui
    PLOS ONE, 2022, 17 (09):
  • [32] Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet
    Zou, Shihao
    Xu, Yuanlu
    Li, Chao
    Ma, Lingni
    Cheng, Li
    Vo, Minh
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4921 - 4933
  • [33] Multi-view Pictorial Structures for 3D Human Pose Estimation
    Amin, Sikandar
    Andriluka, Mykhaylo
    Rohrbach, Marcus
    Schiele, Bernt
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
  • [34] Generative Multi-View Based 3D Human Pose Estimation
    Sabri, Motaz
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY, SIET 2021, 2021, : 2 - 9
  • [35] Multi-view 3D Human Pose Estimation in Complex Environment
    M. Hofmann
    D. M. Gavrila
    International Journal of Computer Vision, 2012, 96 : 103 - 124
  • [36] PROGRESSIVE MULTI-VIEW FUSION FOR 3D HUMAN POSE ESTIMATION
    Zhang, Lijun
    Zhou, Kangkang
    Liu, Liangchen
    Li, Zhenghao
    Zhao, Xunyi
    Zhou, Xiang-Dong
    Shi, Yu
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1600 - 1604
  • [37] Multi-view 3D Human Pose Estimation in Complex Environment
    Hofmann, M.
    Gavrila, D. M.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 96 (01) : 103 - 124
  • [38] Markerless multi-view 3D human pose estimation: A survey
    Nogueira, Ana Filipa Rodrigues
    Oliveira, Hélder P.
    Teixeira, Luís F.
    Image and Vision Computing, 2025, 155
  • [39] Epipolar Transformer for Multi-view Human Pose Estimation
    He, Yihui
    Yan, Rui
    Fragkiadaki, Katerina
    Yu, Shoou-, I
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4466 - 4471
  • [40] Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views
    Dong, Junting
    Jiang, Wen
    Huang, Qixing
    Bao, Hujun
    Zhou, Xiaowei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7784 - 7793