CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引:3
|
作者
Zhao, Jianyu [1 ]
Sanderson, Edward [1 ]
Matuszewski, Bogdan J. J. [1 ]
机构
[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England
基金
英国工程与自然科学研究理事会;
关键词
3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;
D O I
10.1109/ACCESS.2023.3243551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.
引用
收藏
页码:13830 / 13845
页数:16
相关论文
共 50 条
  • [41] MULTI-LEVEL NETWORK FOR HIGH-SPEED MULTI-PERSON POSE ESTIMATION
    Huang, Ying
    Zhuang, Jiankai
    Qin, Zengchang
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 589 - 593
  • [42] Domain-Translated 3D Object Pose Estimation
    Papaioannidis, Christos
    Mygdalis, Vasileios
    Pitas, Ioannis
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9279 - 9291
  • [43] Surface-based General 3D Object Detection and Pose Estimation
    Teng, Zhou
    Xiao, Jing
    2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 5473 - 5479
  • [44] HandyPose: Multi-level framework for hand pose estimation
    Gupta, Divyansh
    Artacho, Bruno
    Savakis, Andreas
    PATTERN RECOGNITION, 2022, 128
  • [45] Multi-Stage Feature Learning Based Object Recognition and 3D Pose Estimation with Kinect
    Zeng, Wei
    Liang, Guoyuan
    Wang, Can
    Wu, Xinyu
    2016 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2016, : 498 - 504
  • [46] MORE: simultaneous multi-view 3D object recognition and pose estimation
    Tommaso Parisotto
    Subhaditya Mukherjee
    Hamidreza Kasaei
    Intelligent Service Robotics, 2023, 16 : 497 - 508
  • [47] MORE: simultaneous multi-view 3D object recognition and pose estimation
    Parisotto, Tommaso
    Mukherjee, Subhaditya
    Kasaei, Hamidreza
    INTELLIGENT SERVICE ROBOTICS, 2023, 16 (04) : 497 - 508
  • [48] Efficient representation and feature extraction for neural network-based 3D object pose estimation
    Kouskouridas, Rigas
    Gasteratos, Antonios
    Emmanouilidis, Christos
    NEUROCOMPUTING, 2013, 120 : 90 - 100
  • [49] Multi-Level Fusion Net for hand pose estimation in hand-object interaction
    Lin, Xiang-Bo
    Zhou, Yi-Dan
    Du, Kuo
    Sun, Yi
    Ma, Xiao-Hong
    Lu, Jian
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 94 (94)
  • [50] 3D Object Pose Estimation Using Multi-Objective Quaternion Learning
    Papaioannidis, Christos
    Pitas, Ioannis
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (08) : 2683 - 2693