SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation

被引：4

作者：

Tan, Tao ^{[1
,2
]}

Dong, Qiulei ^{[1
,2
,3
]}

机构：

[1] UCAS, Sch Artificial Intelligence, Beijing, Peoples R China

[2] CASIA, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.02041

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, self-supervised 6D object pose estimation, where synthetic images with object poses (sometimes jointly with un-annotated real images) are used for training, has attracted much attention in computer vision. Some typical works in literature employ a time-consuming differentiable renderer for object pose prediction at the training stage, so that (i) their performances on real images are generally limited due to the gap between their rendered images and real images and (ii) their training process is computationally expensive. To address the two problems, we propose a novel Network for Self-supervised Monocular Object pose estimation by utilizing the predicted Camera poses from unannotated real images, called SMOC-Net. The proposed network is explored under a knowledge distillation framework, consisting of a teacher model and a student model. The teacher model contains a backbone estimation module for initial object pose estimation, and an object pose refiner for refining the initial object poses using a geometric constraint (called relative-pose constraint) derived from relative camera poses. The student model gains knowledge for object pose estimation from the teacher model by imposing the relative-pose constraint. Thanks to the relative-pose constraint, SMOC-Net could not only narrow the domain gap between synthetic and real data but also reduce the training cost. Experimental results on two public datasets demonstrate that SMOC-Net outperforms several state-of-the-art methods by a large margin while requiring much less training time than the differentiable-renderer-based methods.

引用

页码：21307 / 21316

页数：10

共 50 条

[21] Self-supervised Multi-frame Monocular Depth Estimation with Pseudo-LiDAR Pose Enhancement
Wu, Wenhua
Wang, Guangming
Zhong, Jiquan
Wang, Hesheng
Liu, Zhe
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10018 - 10025
[22] Robust self-supervised monocular visual odometry based on prediction-update pose estimation network
Xiu, Haixin
Liang, Yiyou
Zeng, Hui
Li, Qing
Liu, Hongmin
Fan, Bin
Li, Chen
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
[23] Self-supervised pose estimation method for a mobile robot in greenhouse
Zhou Y.
Xu T.
Deng H.
Miao T.
Wu Q.
Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2021, 37 (09): : 263 - 274
[24] Structural Equivariance Self-Supervised Learning for Facial Pose Estimation
Wang, Yaoxing
Zhou, Heng
Li, Zhendong
Mo, Xian
Liu, Hao
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2651 - 2656
[25] Exploring self-supervised learning techniques for hand pose estimation
Dahiya, Aneesh
Spurr, Adrian
Hilliges, Otmar
NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 255 - 271
[26] PMIndoor: Pose Rectified Network and Multiple Loss Functions for Self-Supervised Monocular Indoor Depth Estimation
Chen, Siyu
Zhu, Ying
Liu, Hong
SENSORS, 2023, 23 (21)
[27] OSSID: Online Self-Supervised Instance Detection by (And For) Pose Estimation
Gu, Qiao
Okorn, Brian
Held, David
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 3022 - 3029
[28] Equivariant Self-supervised Deep Pose Estimation for Cryo EM
Cesa, Gabriele
Kumar, Pratik
Behboodi, Arash
TOPOLOGICAL, ALGEBRAIC AND GEOMETRIC LEARNING WORKSHOPS 2023, VOL 221, 2023, 221
[29] PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
Tu, Diantao
Cui, Hainan
Zheng, Xianwei
Shen, Shuhan
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 20009 - 20018
[30] Spacecraft pose estimation using a monocular camera
1600, International Astronautical Federation, IAF (00):

← 1 2 3 4 5 →