CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引:3
|
作者
Zhao, Jianyu [1 ]
Sanderson, Edward [1 ]
Matuszewski, Bogdan J. J. [1 ]
机构
[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England
基金
英国工程与自然科学研究理事会;
关键词
3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;
D O I
10.1109/ACCESS.2023.3243551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.
引用
收藏
页码:13830 / 13845
页数:16
相关论文
共 50 条
  • [1] MANet: Multi-level Attention Network for 3D Human Shape and Pose Estimation
    Yao, Chenhao
    Li, Guiqing
    Zeng, Juncheng
    Nie, Yongwei
    Xian, Chuhua
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 476 - 488
  • [2] Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud
    Jung, Tae-Won
    Jeong, Chi-Seo
    Kim, In-Seon
    Yu, Min-Su
    Kwon, Soon-Chul
    Jung, Kye-Dong
    SENSORS, 2022, 22 (21)
  • [3] A Multi-Level Network for Human Pose Estimation
    Shao, Zhanpeng
    Liu, Peng
    Li, Youfu
    Yang, Jianyu
    Zhou, Xiaolong
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13085 - 13091
  • [4] Multi-Task and Multi-Level Detection Neural Network Based Real-Time 3D Pose Estimation
    Luo, Dingli
    Du, Songlin
    Ikenaga, Takeshi
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1427 - 1434
  • [5] Exploring multi-level transformers with feature frame padding network for 3D human pose estimation
    Arthanari, Sathiyamoorthi
    Jeong, Jae Hoon
    Joo, Young Hoon
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [6] Modulated Graph Convolutional Network for 3D Human Pose Estimation
    Zou, Zhiming
    Tang, Wei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11457 - 11467
  • [7] Flexible Graph Convolutional Network for 3D Human Pose Estimation
    Shahjahan, Abu Taib Mohammed
    Hamza, A. Ben
    arXiv,
  • [8] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
    Wan, Ziniu
    Li, Zhengjia
    Tian, Maoqing
    Liu, Jianbo
    Yi, Shuai
    Li, Hongsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13013 - 13022
  • [9] MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation
    Li, Qianxing
    Kong, Dehui
    Li, Jinghua
    Yin, Baocai
    IET COMPUTER VISION, 2025, 19 (01)
  • [10] Pose Guided RGBD Feature Learning for 3D Object Pose Estimation
    Balntas, Vassileios
    Doumanoglou, Andreas
    Sahin, Caner
    Sock, Juil
    Kouskouridas, Rigas
    Kim, Tae-Kyun
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3876 - 3884