Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引:0
|
作者
Saavedra-Ruiz, Miguel [1 ]
Morin, Sacha [1 ]
Paull, Liam [1 ]
机构
[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada
关键词
Vision Transformer; Image Segmentation; Visual Servoing;
D O I
10.1109/CRV55824.2022.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.
引用
收藏
页码:197 / 204
页数:8
相关论文
共 50 条
  • [21] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
    Mo, Shentong
    Sun, Zhun
    Li, Chao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
  • [22] Patch-level Representation Learning for Self-supervised Vision Transformers
    Yun, Sukmin
    Lee, Hankook
    Kim, Jaehyung
    Shin, Jinwoo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
  • [23] Self-Supervised Monocular Depth Hints
    Watson, Jamie
    Firman, Michael
    Brostow, Gabriel J.
    Turmukhambetov, Daniyar
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2162 - 2171
  • [24] Self-Supervised Monocular Depth Underwater
    Amitai, Shlomi
    Klein, Itzik
    Treibitz, Tali
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1098 - 1104
  • [25] Persistent self-supervised learning: From stereo to monocular vision for obstacle avoidance
    van Hecke, Kevin
    de Croon, Guido
    van der Maaten, Laurens
    Hennes, Daniel
    Izzo, Dario
    INTERNATIONAL JOURNAL OF MICRO AIR VEHICLES, 2018, 10 (02) : 186 - 206
  • [26] Self-Supervised Transformers for fMRI representation
    Malkiel, Itzik
    Rosenman, Gony
    Wolf, Lior
    Hendler, Talma
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 895 - 913
  • [27] On Separate Normalization in Self-supervised Transformers
    Chen, Xiaohui
    Wang, Yinkai
    Du, Yuanqi
    Hassoun, Soha
    Liu, Li-Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
    Pincic, Domagoj
    Susanj, Diego
    Lenac, Kristijan
    SENSORS, 2022, 22 (19)
  • [29] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
    Chen, Richard J.
    Chen, Chengkuan
    Li, Yicong
    Chen, Tiffany Y.
    Trister, Andrew D.
    Krishnan, Rahul G.
    Mahmood, Faisal
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
  • [30] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
    Wang, Yi
    Albrecht, Conrad M.
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142