SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation

被引:4
|
作者
Tan, Tao [1 ,2 ]
Dong, Qiulei [1 ,2 ,3 ]
机构
[1] UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[2] CASIA, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.02041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, self-supervised 6D object pose estimation, where synthetic images with object poses (sometimes jointly with un-annotated real images) are used for training, has attracted much attention in computer vision. Some typical works in literature employ a time-consuming differentiable renderer for object pose prediction at the training stage, so that (i) their performances on real images are generally limited due to the gap between their rendered images and real images and (ii) their training process is computationally expensive. To address the two problems, we propose a novel Network for Self-supervised Monocular Object pose estimation by utilizing the predicted Camera poses from unannotated real images, called SMOC-Net. The proposed network is explored under a knowledge distillation framework, consisting of a teacher model and a student model. The teacher model contains a backbone estimation module for initial object pose estimation, and an object pose refiner for refining the initial object poses using a geometric constraint (called relative-pose constraint) derived from relative camera poses. The student model gains knowledge for object pose estimation from the teacher model by imposing the relative-pose constraint. Thanks to the relative-pose constraint, SMOC-Net could not only narrow the domain gap between synthetic and real data but also reduce the training cost. Experimental results on two public datasets demonstrate that SMOC-Net outperforms several state-of-the-art methods by a large margin while requiring much less training time than the differentiable-renderer-based methods.
引用
收藏
页码:21307 / 21316
页数:10
相关论文
共 50 条
  • [41] SS-Pose: Self-Supervised 6-D Object Pose Representation Learning Without Rendering
    Mu, Fengjun
    Huang, Rui
    Zhang, Jingting
    Zou, Chaobin
    Shi, Kecheng
    Sun, Shixiang
    Zhan, Huayi
    Zhao, Pengbo
    Qiu, Jing
    Cheng, Hong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (12) : 13665 - 13675
  • [42] Self-Supervised Domain Adaptation for 6DoF Pose Estimation
    Jin, Juseong
    Jeong, Eunju
    Cho, Joonmyun
    Kim, Young-Gon
    IEEE ACCESS, 2024, 12 : 101528 - 101535
  • [43] Robotic Grasp Detection Based on Category-Level Object Pose Estimation With Self-Supervised Learning
    Yu, Sheng
    Zhai, Di-Hua
    Xia, Yuanqing
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (01) : 625 - 635
  • [44] SoftPOSIT Enhancements for Monocular Camera Spacecraft Pose Estimation
    Shi, Jian-Feng
    Ulrich, Steve
    2016 21ST INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2016, : 30 - 35
  • [45] Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera
    Tremblay, Jonathan
    Tyree, Stephen
    Mosier, Terry
    Birchfield, Stan
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 4227 - 4234
  • [46] Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
    Varma, Arnav
    Chawla, Hemang
    Zonooz, Bahram
    Arani, Elahe
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 758 - 769
  • [47] Fine-Grained Object Classification via Self-Supervised Pose Alignment
    Yang, Xuhui
    Wang, Yaowei
    Chen, Ke
    Xu, Yong
    Tian, Yonghong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7389 - 7398
  • [48] Pose estimation of moving object based-on dual quaternion from monocular camera
    Feng, Guohu
    Zhang, Dayong
    Wu, Wenqi
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2010, 35 (10): : 1147 - 1150
  • [49] Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images
    Li, Wenda
    Hayashi, Yuichiro
    Oda, Masahiro
    Kitasaka, Takayuki
    Misawa, Kazunari
    Mori, Kensaku
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2025, : 775 - 785
  • [50] SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications
    Yang, Binchao
    Xu, Xinying
    Ren, Jinchang
    Cheng, Lan
    Guo, Lei
    Zhang, Zhe
    PATTERN RECOGNITION LETTERS, 2022, 153 : 126 - 135