SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation

被引:4
|
作者
Tan, Tao [1 ,2 ]
Dong, Qiulei [1 ,2 ,3 ]
机构
[1] UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[2] CASIA, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.02041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, self-supervised 6D object pose estimation, where synthetic images with object poses (sometimes jointly with un-annotated real images) are used for training, has attracted much attention in computer vision. Some typical works in literature employ a time-consuming differentiable renderer for object pose prediction at the training stage, so that (i) their performances on real images are generally limited due to the gap between their rendered images and real images and (ii) their training process is computationally expensive. To address the two problems, we propose a novel Network for Self-supervised Monocular Object pose estimation by utilizing the predicted Camera poses from unannotated real images, called SMOC-Net. The proposed network is explored under a knowledge distillation framework, consisting of a teacher model and a student model. The teacher model contains a backbone estimation module for initial object pose estimation, and an object pose refiner for refining the initial object poses using a geometric constraint (called relative-pose constraint) derived from relative camera poses. The student model gains knowledge for object pose estimation from the teacher model by imposing the relative-pose constraint. Thanks to the relative-pose constraint, SMOC-Net could not only narrow the domain gap between synthetic and real data but also reduce the training cost. Experimental results on two public datasets demonstrate that SMOC-Net outperforms several state-of-the-art methods by a large margin while requiring much less training time than the differentiable-renderer-based methods.
引用
收藏
页码:21307 / 21316
页数:10
相关论文
共 50 条
  • [31] HI-Net: Boosting Self-Supervised Indoor Depth Estimation via Pose Optimization
    Wu, Guanghui
    Li, Kunhong
    Wang, Longguang
    Hu, Ruizhen
    Guo, Yulan
    Chen, Zengping
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (01) : 224 - 231
  • [32] TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
    Chen, Hanzhi
    Manhardt, Fabian
    Navab, Nassir
    Busam, Benjamin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4841 - 4852
  • [33] intrApose: Monocular Driver 6 DOF Head Pose Estimation Leveraging Camera Intrinsics
    Roth, Markus
    Gavrila, Dariu M.
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (08): : 4057 - 4068
  • [34] Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer
    Lu, Jingpei
    Richter, Florian
    Yip, Michael C.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21296 - 21306
  • [35] Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning
    Spurr, Adrian
    Dahiya, Aneesh
    Wang, Xi
    Zhang, Xucong
    Hilliges, Otmar
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11210 - 11219
  • [36] Self-supervised multi-frame depth estimation with visual-inertial pose transformer and monocular guidance
    Wang, Xiang
    Luo, Haonan
    Wang, Zihang
    Zheng, Jin
    Bai, Xiao
    INFORMATION FUSION, 2024, 108
  • [37] Self-Supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement
    Pietrantoni, Maxime
    Csurka, Gabriela
    Humenberger, Martin
    Sattler, Torsten
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 484 - 494
  • [38] Self-Supervised Human Pose based Multi-Camera Video Synchronization
    Yin, Liqiang
    Han, Ruize
    Feng, Wei
    Wang, Song
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1739 - 1748
  • [39] Robust Human Pose Estimation for Rotation via Self-Supervised Learning
    Yun, Kimin
    Park, Jongyoul
    Cho, Jungchan
    IEEE ACCESS, 2020, 8 : 32502 - 32517
  • [40] Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning
    Yoo, Nobline
    Russakovsky, Olga
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3263 - 3272