SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation

被引:4
|
作者
Tan, Tao [1 ,2 ]
Dong, Qiulei [1 ,2 ,3 ]
机构
[1] UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[2] CASIA, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.02041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, self-supervised 6D object pose estimation, where synthetic images with object poses (sometimes jointly with un-annotated real images) are used for training, has attracted much attention in computer vision. Some typical works in literature employ a time-consuming differentiable renderer for object pose prediction at the training stage, so that (i) their performances on real images are generally limited due to the gap between their rendered images and real images and (ii) their training process is computationally expensive. To address the two problems, we propose a novel Network for Self-supervised Monocular Object pose estimation by utilizing the predicted Camera poses from unannotated real images, called SMOC-Net. The proposed network is explored under a knowledge distillation framework, consisting of a teacher model and a student model. The teacher model contains a backbone estimation module for initial object pose estimation, and an object pose refiner for refining the initial object poses using a geometric constraint (called relative-pose constraint) derived from relative camera poses. The student model gains knowledge for object pose estimation from the teacher model by imposing the relative-pose constraint. Thanks to the relative-pose constraint, SMOC-Net could not only narrow the domain gap between synthetic and real data but also reduce the training cost. Experimental results on two public datasets demonstrate that SMOC-Net outperforms several state-of-the-art methods by a large margin while requiring much less training time than the differentiable-renderer-based methods.
引用
收藏
页码:21307 / 21316
页数:10
相关论文
共 50 条
  • [1] Latent Representation Self-Supervised Pose Network for Accurate Monocular Pipe Pose Estimation
    Hu, Jia
    Liu, Shaoli
    Liu, Jianhua
    Wang, Zhenjie
    Zhang, Wenxiong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) : 7180 - 7189
  • [2] Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation
    Wang, Gu
    Manhardt, Fabian
    Liu, Xingyu
    Ji, Xiangyang
    Tombari, Federico
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1788 - 1803
  • [3] Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation
    Li, Xiaolong
    Weng, Yijia
    Yi, Li
    Guibas, Leonidas
    Abbott, A. Lynn
    Song, Shuran
    Wang, He
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Self-Supervised Object Distance Estimation Using a Monocular Camera
    Liang, Hong
    Ma, Zizhen
    Zhang, Qian
    SENSORS, 2022, 22 (08)
  • [5] Self-supervised Monocular Pose and Depth Estimation for Wireless Capsule Endoscopy with Transformers
    Nazifi, Nahid
    Araujo, Helder
    Erabati, Gopi Krishna
    Tahri, Omar
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [6] Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation
    Wu, Guanghui
    Chen, Lili
    Chen, Zengping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1498 - 1511
  • [7] CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild
    Wandt, Bastian
    Rudolph, Marco
    Zell, Petrissa
    Rhodin, Helge
    Rosenhahn, Bodo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13289 - 13299
  • [8] Self-supervised 6D Object Pose Estimation for Robot Manipulation
    Deng, Xinke
    Xiang, Yu
    Mousavian, Arsalan
    Eppner, Clemens
    Bretl, Timothy
    Fox, Dieter
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3665 - 3671
  • [9] Estimation of Vehicle Pose with Monocular Camera
    Zubov, Ilya G.
    PROCEEDINGS OF THE 2019 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2019, : 395 - 397
  • [10] Direct pose estimation with a monocular camera
    Burschka, Darius
    Mair, Elmar
    ROBOT VISION, PROCEEDINGS, 2008, 4931 : 440 - 453