CapsulePose: A variational CapsNet for real-time end-to-end 3D human pose estimation

被引:5
|
作者
Garau, Nicola [1 ]
Conci, Nicola [1 ]
机构
[1] Univ Trento, Via Sommar 9, I-38123 Trento, Italy
关键词
Capsule networks; 3D human pose estimation; Viewpoint-equivariance; Deep learning; Real-time; RECOGNITION;
D O I
10.1016/j.neucom.2022.11.097
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimating 3D human poses from images is an ill-posed regression problem, which is usually tackled by viewpoint-invariant convolutional neural networks (CNNs). Recently, capsule networks (CapsNets) have been introduced as a viable alternative to CNNs, ensuring viewpoint-equivariance and drastically reducing both the dataset size and the network complexity, while retaining high output accuracy. We propose a real-time end-to-end human pose estimation (HPE) network which employs state-of-the-art matrix capsules [1] and a fast variational Bayesian capsule routing, without relying on pre-training, complex data augmentation or multiple datasets. We achieve comparable results to the HPE state-of-the-art, and the lowest error among methods using CapsNets, while at the same time achieving other desirable properties, namely greater generalization capabilities, stronger viewpoint equivariance and highly decreased data dependency, allowing for our network to be trained with only a fraction of the available datasets and without any data augmentation.
引用
收藏
页码:81 / 91
页数:11
相关论文
共 50 条
  • [1] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536
  • [2] SP-YOLO: an end-to-end lightweight network for real-time human pose estimation
    Yuting Zhang
    Zongyan Wang
    Menglong Li
    Pei Gao
    [J]. Signal, Image and Video Processing, 2024, 18 : 863 - 876
  • [3] SP-YOLO: an end-to-end lightweight network for real-time human pose estimation
    Zhang, Yuting
    Wang, Zongyan
    Li, Menglong
    Gao, Pei
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 863 - 876
  • [4] Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications
    Hung-Cuong Nguyen
    Thi-Hao Nguyen
    Scherer, Rafal
    Van-Hung Le
    [J]. SENSORS, 2022, 22 (14)
  • [5] An End-to-End Real-Time 3D System for Integral Photography Display
    Zhang, Shenghao
    Wang, Zhenyu
    Zhu, Mingtong
    Wang, Ronggang
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 246 - 256
  • [6] An end-to-end framework for unconstrained monocular 3D hand pose estimation
    Sharma, Sanjeev
    Huang, Shaoli
    [J]. PATTERN RECOGNITION, 2021, 115
  • [7] End-to-End Feature Pyramid Network for Real-Time Multi-Person Pose Estimation
    Luo, Dingli
    Du, Songlin
    Ikenaga, Takeshi
    [J]. PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
  • [8] End-to-End 3D Human Pose Estimation Network With Multi-Layer Feature Fusion
    Cai, Guoci
    Zhang, Changshe
    Xie, Jingxiu
    Pan, Jie
    Li, Chaopeng
    Wu, Yiliang
    [J]. IEEE ACCESS, 2024, 12 : 89124 - 89134
  • [9] CEE-Net: Complementary End-to-End Network for 3D Human Pose Generation and Estimation
    Li, Haolun
    Pun, Chi-Man
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1305 - 1313
  • [10] A Real-Time 3D End-to-End Augmented Reality System (and its Representation Transformations)
    Tytgat, Donny
    Aerts, Maarten
    De Busser, Jeroen
    Lievens, Sammy
    Alface, Patrice Rondao
    Macq, Jean -Francois
    [J]. APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIX, 2016, 9971