CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation

被引：3

作者：

Zhao, Jianyu ^{[1
]}

Sanderson, Edward ^{[1
]}

Matuszewski, Bogdan J. J. ^{[1
]}

机构：

[1] Univ Cent Lancashire, Comp Vis & Machine Learning CVML Grp, Preston PR1 2HE, England

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

英国工程与自然科学研究理事会;

关键词：

3D pose estimation; deep learning; variational autoencoder; synthetic data; 6D POSE;

D O I：

10.1109/ACCESS.2023.3243551

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.

引用

页码：13830 / 13845

页数：16

共 50 条

[1] MANet: Multi-level Attention Network for 3D Human Shape and Pose Estimation
Yao, Chenhao
Li, Guiqing
Zeng, Juncheng
Nie, Yongwei
Xian, Chuhua
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 476 - 488
[2] Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud
Jung, Tae-Won
Jeong, Chi-Seo
Kim, In-Seon
Yu, Min-Su
Kwon, Soon-Chul
Jung, Kye-Dong
SENSORS, 2022, 22 (21)
[3] A Multi-Level Network for Human Pose Estimation
Shao, Zhanpeng
Liu, Peng
Li, Youfu
Yang, Jianyu
Zhou, Xiaolong
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13085 - 13091
[4] Multi-Task and Multi-Level Detection Neural Network Based Real-Time 3D Pose Estimation
Luo, Dingli
Du, Songlin
Ikenaga, Takeshi
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1427 - 1434
[5] Exploring multi-level transformers with feature frame padding network for 3D human pose estimation
Arthanari, Sathiyamoorthi
Jeong, Jae Hoon
Joo, Young Hoon
MULTIMEDIA SYSTEMS, 2024, 30 (05)
[6] Modulated Graph Convolutional Network for 3D Human Pose Estimation
Zou, Zhiming
Tang, Wei
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11457 - 11467
[7] Flexible Graph Convolutional Network for 3D Human Pose Estimation
Shahjahan, Abu Taib Mohammed
Hamza, A. Ben
arXiv,
[8] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
Wan, Ziniu
Li, Zhengjia
Tian, Maoqing
Liu, Jianbo
Yi, Shuai
Li, Hongsheng
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13013 - 13022
[9] MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation
Li, Qianxing
Kong, Dehui
Li, Jinghua
Yin, Baocai
IET COMPUTER VISION, 2025, 19 (01)
[10] Pose Guided RGBD Feature Learning for 3D Object Pose Estimation
Balntas, Vassileios
Doumanoglou, Andreas
Sahin, Caner
Sock, Juil
Kouskouridas, Rigas
Kim, Tae-Kyun
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3876 - 3884

← 1 2 3 4 5 →