Cross-Modal Knowledge Distillation for Depth Privileged Monocular Visual Odometry

被引:4
|
作者
Li, Bin [1 ]
Wang, Shuling [1 ]
Ye, Haifeng [1 ]
Gong, Xiaojin [1 ]
Xiang, Zhiyu [1 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Deep learning methods; knowledge distillation; localization; visual odometry;
D O I
10.1109/LRA.2022.3166457
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Most self-supervised monocular visual odometry (VO) suffer from the scale ambiguity problem. A promising way to address this problem is to introduce additional information for training. In this work, we propose a new depth privileged framework to learn a monocular VO. It assumes that sparse depth is provided during training time but not available at the test stage. To make full use of the privileged depth information, we propose a cross-modal knowledge distillation method, which utilizes a well-trained visual-lidar odometry (VLO) as a teacher to guide the training of the VO network. Knowledge distillation is conducted at both output and hint levels. Besides, a distillation condition check is also designed to leave out the noise that may be contained in the teacher's predictions. Experiments on the KITTI odometry benchmark show that the proposed method produces accurate pose estimation results with a recovered actual scale. It also outperforms most stereo privileged monocular VOs.
引用
收藏
页码:6171 / 6178
页数:8
相关论文
共 50 条
  • [31] Cross-Modal Graph Knowledge Representation and Distillation Learning for Land Cover Classification
    Wang, Wenzhen
    Liu, Fang
    Liao, Wenzhi
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [32] A Precise and Scalable Indoor Positioning System Using Cross-Modal Knowledge Distillation
    Rizk, Hamada
    Elmogy, Ahmed
    Rihan, Mohamed
    Yamaguchi, Hirozumi
    [J]. Sensors, 2024, 24 (22)
  • [33] Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation
    Yuki Takashima
    Ryoichi Takashima
    Ryota Tsunoda
    Ryo Aihara
    Tetsuya Takiguchi
    Yasuo Ariki
    Nobuaki Motoyama
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [34] Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
    Dai, Rui
    Das, Srijan
    Bremond, Francois
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13033 - 13044
  • [35] CKDH: CLIP-Based Knowledge Distillation Hashing for Cross-Modal Retrieval
    Li, Jiaxing
    Wong, Wai Keung
    Jiang, Lin
    Fang, Xiaozhao
    Xie, Shengli
    Xu, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6530 - 6541
  • [36] Evidence for Cross-Modal Plasticity in Adult Mouse Visual Cortex Following Monocular Enucleation
    Van Brussel, Leen
    Gerits, Annelies
    Arckens, Lutgarde
    [J]. CEREBRAL CORTEX, 2011, 21 (09) : 2133 - 2146
  • [37] Unsupervised Scale Network for Monocular Relative Depth and Visual Odometry
    Wang, Zhongyi
    Chen, Qijun
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [38] Cross-modal distillation for flood extent mapping
    Garg, Shubhika
    Feinstein, Ben
    Timnat, Shahar
    Batchu, Vishal
    Dror, Gideon
    Rosenthal, Adi Gerzi
    Gulshan, Varun
    [J]. ENVIRONMENTAL DATA SCIENCE, 2023, 2
  • [39] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
  • [40] Visual determinants of a cross-modal illusion
    James A. Armontrout
    Michael Schiutz
    Michael Kubovy
    [J]. Attention, Perception, & Psychophysics, 2009, 71 : 1618 - 1627