Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning

被引:47
|
作者
Zurn, Jannik [1 ]
Burgard, Wolfram [1 ,2 ]
Valada, Abhinav [1 ]
机构
[1] Univ Freiburg, Dept Comp Sci, D-79085 Breisgau, Germany
[2] Toyota Res Inst, Automated Driving Technol, Los Altos, CA 94022 USA
关键词
Robot sensing systems; Visualization; Semantics; Labeling; Mobile robots; Trajectory; Multi-modal sensors; robot learning; unsupervised learning; ROBOT NAVIGATION;
D O I
10.1109/TRO.2020.3031214
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Mobile robots operating in unknown urban environments encounter a wide range of complex terrains to which they must adapt their planned trajectory for safe and efficient navigation. Most existing approaches utilize supervised learning to classify terrains from either an exteroceptive or a proprioceptive sensor modality. However, this requires a tremendous amount of manual labeling effort for each newly encountered terrain as well as for variations of terrains caused by changing environmental conditions. In this article, we propose a novel terrain classification framework leveraging an unsupervised proprioceptive classifier that learns from vehicle-terrain interaction sounds to self-supervise an exteroceptive classifier for pixelwise semantic segmentation of images. To this end, we first learn a discriminative embedding space for vehicle-terrain interaction sounds from triplets of audio clips formed using visual features of the corresponding terrain patches and cluster the resulting embeddings. We subsequently use these clusters to label the visual terrain patches by projecting the traversed tracks of the robot into the camera images. Finally, we use the sparsely labeled images to train our semantic segmentation network in a weakly supervised manner. We present extensive quantitative and qualitative results that demonstrate that our proprioceptive terrain classifier exceeds the state-of-the-art among unsupervised methods and our self-supervised exteroceptive semantic segmentation model achieves a comparable performance to supervised learning with manually labeled data.
引用
收藏
页码:466 / 481
页数:16
相关论文
共 50 条
  • [1] Self-supervised Visual Feature Learning and Classification Framework: Based on Contrastive Learning
    Wang, Zhibo
    Yan, Shen
    Zhang, Xiaoyu
    Lobo, Niels Da Vitoria
    [J]. 16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 719 - 725
  • [2] Single-modal Incremental Terrain Clustering from Self-Supervised Audio-Visual Feature Learning
    Ishikawa, Reina
    Hachiuma, Ryo
    Kurobe, Akiyoshi
    Saito, Hideo
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9399 - 9406
  • [3] FEDERATED SELF-SUPERVISED LEARNING FOR ACOUSTIC EVENT CLASSIFICATION
    Feng, Meng
    Kao, Chieh-Chi
    Tang, Qingming
    Sun, Ming
    Rozgic, Viktor
    Matsoukas, Spyros
    Wang, Chao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 481 - 485
  • [4] Self-Supervised Learning Guided by SAR Image Factors for Terrain Classification
    Ren, Zhongle
    Du, Zhe
    Liu, Shaobo
    Hou, Biao
    Li, Weibin
    Zhu, Hao
    Ren, Bo
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 18
  • [5] Adaptive Terrain Classification in Field Environment Based on Self-supervised Learning
    Dai, Xiaofang
    Li, Shulun
    Sun, Fengchi
    [J]. 2014 IEEE CHINESE GUIDANCE, NAVIGATION AND CONTROL CONFERENCE (CGNCC), 2014, : 6 - 11
  • [6] Self-Supervised Visual Acoustic Matching
    Somayazulu, Arjun
    Chen, Changan
    Grauman, Kristen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Self-Supervised Audio-Visual Feature Learning for Single-Modal Incremental Terrain Type Clustering
    Ishikawa, Reina
    Hachiuma, Ryo
    Saito, Hideo
    [J]. IEEE ACCESS, 2021, 9 : 64346 - 64357
  • [8] Unsupervised Few-Shot Feature Learning via Self-Supervised Training
    Ji, Zilong
    Zou, Xiaolong
    Huang, Tiejun
    Wu, Si
    [J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2020, 14
  • [9] On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning
    Parcollet, Titouan
    Zhang, Shucong
    Ramos, Alberto Gil C. P.
    van Dalen, Rogier
    Bhattacharya, Sourav
    [J]. INTERSPEECH 2023, 2023, : 581 - 585
  • [10] On Feature Decorrelation in Self-Supervised Learning
    Hua, Tianyu
    Wang, Wenxiao
    Xue, Zihui
    Ren, Sucheng
    Wang, Yue
    Zhao, Hang
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9578 - 9588