Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引:0
|
作者
Saavedra-Ruiz, Miguel [1 ]
Morin, Sacha [1 ]
Paull, Liam [1 ]
机构
[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada
关键词
Vision Transformer; Image Segmentation; Visual Servoing;
D O I
10.1109/CRV55824.2022.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.
引用
收藏
页码:197 / 204
页数:8
相关论文
共 50 条
  • [31] Reverse Optical Flow for Self-Supervised Adaptive Autonomous Robot Navigation
    A. Lookingbill
    J. Rogers
    D. Lieb
    J. Curry
    S. Thrun
    International Journal of Computer Vision, 2007, 74 : 287 - 302
  • [32] Reverse optical flow for self-supervised adaptive autonomous robot navigation
    Lookingbill, A.
    Rogers, J.
    Lieb, D.
    Curry, J.
    Thrun, S.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2007, 74 (03) : 287 - 302
  • [33] Detail-Preserving Self-Supervised Monocular Depth with Self-Supervised Structural Sharpening
    Bello, Juan Luis Gonzalez
    Moon, Jaeho
    Kim, Munchurl
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 254 - 264
  • [34] DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer
    Kumar, Sonal
    Sur, Arijit
    Baruah, Rashmi Dutta
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1775 - 1788
  • [35] Monocular Vision Navigation and Control of Mobile Robot
    Yan, Runchen
    Wang, Hong
    Yang, Yuzhi
    Wei, Huanbing
    Wang, Yonggang
    CONFERENCE ON MODELING, IDENTIFICATION AND CONTROL, 2012, 3 : 707 - 714
  • [36] Automatic Navigation for A Mobile Robot with Monocular Vision
    Zhan, Qiang
    Huang, Shouren
    Wu, Jia
    2008 IEEE CONFERENCE ON ROBOTICS, AUTOMATION, AND MECHATRONICS, VOLS 1 AND 2, 2008, : 531 - 536
  • [37] ROBOT NAVIGATION USING LANDMARKS AND MONOCULAR VISION
    Manoiu-Olaru, S.
    Nitulescu, M.
    PROCEEDINGS OF 11TH INTERNATIONAL CARPATHIAN CONTROL CONFERENCE, 2010, 2010, : 493 - 496
  • [38] Digging Into Self-Supervised Monocular Depth Estimation
    Godard, Clement
    Mac Aodha, Oisin
    Firman, Michael
    Brostow, Gabriel
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3827 - 3837
  • [39] Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
    Kang, Dahyun
    Koniusz, Piotr
    Cho, Minsu
    Murray, Naila
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19627 - 19638
  • [40] Self-supervised monocular depth estimation in fog
    Tao, Bo
    Hu, Jiaxin
    Jiang, Du
    Li, Gongfa
    Chen, Baojia
    Qian, Xinbo
    OPTICAL ENGINEERING, 2023, 62 (03)