SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation

被引：4

作者：

Tan, Tao ^{[1
,2
]}

Dong, Qiulei ^{[1
,2
,3
]}

机构：

[1] UCAS, Sch Artificial Intelligence, Beijing, Peoples R China

[2] CASIA, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.02041

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, self-supervised 6D object pose estimation, where synthetic images with object poses (sometimes jointly with un-annotated real images) are used for training, has attracted much attention in computer vision. Some typical works in literature employ a time-consuming differentiable renderer for object pose prediction at the training stage, so that (i) their performances on real images are generally limited due to the gap between their rendered images and real images and (ii) their training process is computationally expensive. To address the two problems, we propose a novel Network for Self-supervised Monocular Object pose estimation by utilizing the predicted Camera poses from unannotated real images, called SMOC-Net. The proposed network is explored under a knowledge distillation framework, consisting of a teacher model and a student model. The teacher model contains a backbone estimation module for initial object pose estimation, and an object pose refiner for refining the initial object poses using a geometric constraint (called relative-pose constraint) derived from relative camera poses. The student model gains knowledge for object pose estimation from the teacher model by imposing the relative-pose constraint. Thanks to the relative-pose constraint, SMOC-Net could not only narrow the domain gap between synthetic and real data but also reduce the training cost. Experimental results on two public datasets demonstrate that SMOC-Net outperforms several state-of-the-art methods by a large margin while requiring much less training time than the differentiable-renderer-based methods.

引用

页码：21307 / 21316

页数：10

共 50 条

[41] SS-Pose: Self-Supervised 6-D Object Pose Representation Learning Without Rendering
Mu, Fengjun
Huang, Rui
Zhang, Jingting
Zou, Chaobin
Shi, Kecheng
Sun, Shixiang
Zhan, Huayi
Zhao, Pengbo
Qiu, Jing
Cheng, Hong
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (12) : 13665 - 13675
[42] Self-Supervised Domain Adaptation for 6DoF Pose Estimation
Jin, Juseong
Jeong, Eunju
Cho, Joonmyun
Kim, Young-Gon
IEEE ACCESS, 2024, 12 : 101528 - 101535
[43] Robotic Grasp Detection Based on Category-Level Object Pose Estimation With Self-Supervised Learning
Yu, Sheng
Zhai, Di-Hua
Xia, Yuanqing
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (01) : 625 - 635
[44] SoftPOSIT Enhancements for Monocular Camera Spacecraft Pose Estimation
Shi, Jian-Feng
Ulrich, Steve
2016 21ST INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2016, : 30 - 35
[45] Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera
Tremblay, Jonathan
Tyree, Stephen
Mosier, Terry
Birchfield, Stan
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 4227 - 4234
[46] Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
Varma, Arnav
Chawla, Hemang
Zonooz, Bahram
Arani, Elahe
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 758 - 769
[47] Fine-Grained Object Classification via Self-Supervised Pose Alignment
Yang, Xuhui
Wang, Yaowei
Chen, Ke
Xu, Yong
Tian, Yonghong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7389 - 7398
[48] Pose estimation of moving object based-on dual quaternion from monocular camera
Feng, Guohu
Zhang, Dayong
Wu, Wenqi
Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2010, 35 (10): : 1147 - 1150
[49] Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images
Li, Wenda
Hayashi, Yuichiro
Oda, Masahiro
Kitasaka, Takayuki
Misawa, Kazunari
Mori, Kensaku
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2025, : 775 - 785
[50] SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications
Yang, Binchao
Xu, Xinying
Ren, Jinchang
Cheng, Lan
Guo, Lei
Zhang, Zhe
PATTERN RECOGNITION LETTERS, 2022, 153 : 126 - 135

← 1 2 3 4 5 →