Joint self-supervised learning of interest point, descriptor, depth, and ego-motion from monocular video

被引:0
|
作者
Wang, Zhongyi [1 ,2 ]
Shen, Mengjiao [1 ]
Chen, Qijun [1 ,2 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China
[2] Tongji Univ, Shanghai Res Inst Intelligent Autonomous Syst, Shanghai 201210, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised learning; Depth estimation; Ego-motion estimation; Interest point learning; VISUAL ODOMETRY; FEATURES;
D O I
10.1007/s11042-024-18382-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the self-supervised learning of several critical factors in Visual Simultaneous Localization and Mapping (VSLAM) in low-level vision: interest point learning, descriptor learning, ego-motion estimation, and depth estimation. The key insight we have is that appearance and geometry constraints can be used to couple these fundamental vision issues. We propose a self-supervised framework for joint training of neural networks for multiple objectives to address complicated issues, simplify systems, and provide important information for deep monocular VSLAM systems. First, we input two adjacent images into pose and depth networks to obtain their corresponding depth maps and camera poses. Then, we employ a differentiable geometry module and utilize the depth maps and camera poses to generate pseudo-input images needed for the interest point network and construct the geometry loss. Further, we input the pseudo-input image and source image into the interest point network to obtain the corresponding interest points, descriptors, and scores. Subsequently, we construct the appearance loss. Finally, we combine the geometry and appearance losses to constrain the whole network in an unsupervised manner. The novelty of this paper is that it integrates the key information necessary in monocular VSLAM into a unified framework that takes into account interest point learning, descriptor learning, ego-motion estimation, and depth estimation at the same time. Without providing any ground truth, our model can combine sub-problems for self-supervised learning and achieve state-of-the-art performance in their respective domains.
引用
收藏
页码:77529 / 77547
页数:19
相关论文
共 50 条
  • [1] Self-Supervised Attention Learning for Depth and Ego-motion Estimation
    Sadek, Assent
    Chidlovskii, Boris
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10054 - 10060
  • [2] Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
    Fang, Jiaojiao
    Liu, Guizhong
    [J]. IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 465 - 477
  • [3] Self-supervised monocular depth and ego-motion estimation for CT-bronchoscopy fusion
    Chang, Qi
    Higgins, William E.
    [J]. IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [4] Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
    Vasiljevic, Igor
    Guizilini, Vitor
    Ambrus, Rares
    Pillai, Sudeep
    Burgard, Wolfram
    Shakhnarovich, Greg
    Gaidon, Adrien
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1 - 11
  • [5] Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue
    Shao, Shuwei
    Pei, Zhongcai
    Chen, Weihai
    Zhu, Wentao
    Wu, Xingming
    Sun, Dianmin
    Zhang, Baochang
    [J]. MEDICAL IMAGE ANALYSIS, 2022, 77
  • [6] Self-Supervised Depth and Ego-Motion Estimation for Monocular Thermal Video Using Multi-Spectral Consistency Loss
    Shin, Ukcheol
    Lee, Kyunghyun
    Lee, Seokju
    Kweon, In So
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 1103 - 1110
  • [7] Maximizing Self-Supervision From Thermal Image for Effective Self-Supervised Learning of Depth and Ego-Motion
    Shin, Ukcheol
    Lee, Kyunghyun
    Lee, Byeong-Uk
    Kweon, In So
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 7771 - 7778
  • [8] Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
    Bian, Jia-Wang
    Li, Zhichao
    Wang, Naiyan
    Zhan, Huangying
    Shen, Chunhua
    Cheng, Ming-Ming
    Reid, Ian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Self-supervised learning of monocular depth and ego-motion estimation for non-rigid scenes in wireless capsule endoscopy videos
    Liao, Chao
    Wang, Chengliang
    Wang, Peng
    Wu, Hao
    Wang, Hongqian
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 91
  • [10] APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video
    Lin, Rui
    Lu, Yao
    Lu, Guangming
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 : 336 - 348