CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引:3
|
作者
Zhao, Jianyu [1 ]
Sanderson, Edward [1 ]
Matuszewski, Bogdan J. J. [1 ]
机构
[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England
基金
英国工程与自然科学研究理事会;
关键词
3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;
D O I
10.1109/ACCESS.2023.3243551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.
引用
收藏
页码:13830 / 13845
页数:16
相关论文
共 50 条
  • [21] 3D HEAD POSE ESTIMATION BASED ON GRAPH CONVOLUTIONAL NETWORK FROM A SINGLE RGB IMAGE
    Lie, Wen-Nung
    Yim, Monyneath
    Aing, Lee
    Chiang, Jui-Chiu
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3963 - 3967
  • [22] 3D Hand-Object Pose Estimation from Depth with Convolutional Neural Networks
    Goudie, Duncan
    Galata, Aphrodite
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 406 - 413
  • [23] Target Recognition and 3D Pose Estimation Based on Prior Knowledge and Convolutional Neural Network for Robots
    Sun, Jingwen
    Zhao, Lijun
    Wang, Li
    Wang, Ke
    Ma, Yuting
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 298 - 304
  • [24] Optimal Pose and Shape Estimation for Category-level 3D Object Perception
    Shi, Jingnan
    Yang, Heng
    Carlone, Luca
    ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,
  • [25] Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video
    Sun, Cheng
    Thomas, Diego
    Kawasaki, Hiroshi
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5959 - 5964
  • [26] Deep 3D Pose Dictionary: 3D Human Pose Estimation from Single RGB Image Using Deep Convolutional Neural Network
    Elbasiony, Reda
    Gomaa, Walid
    Ogata, Tetsuya
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 310 - 320
  • [27] PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation
    Guo, Wen
    Corona, Enric
    Moreno-Noguer, Francesc
    Alameda-Pineda, Xavier
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2795 - 2805
  • [28] Learning Descriptors for Object Recognition and 3D Pose Estimation
    Wohlhart, Paul
    Lepetit, Vincent
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3109 - 3118
  • [29] Feature Boosting Network For 3D Pose Estimation
    Liu, Jun
    Ding, Henghui
    Shahroudy, Amir
    Duan, Ling-Yu
    Jiang, Xudong
    Wang, Gang
    Kot, Alex C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 494 - 501
  • [30] Deep Manifold Embedding for 3D Object Pose Estimation
    Ninomiya, Hiroshi
    Kawanishi, Yasutomo
    Deguchi, Daisuke
    Ide, Ichiro
    Murase, Hiroshi
    Kobori, Norimasa
    Nakano, Yusuke
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 173 - 178