Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引：0

作者：

Saavedra-Ruiz, Miguel ^{[1
]}

Morin, Sacha ^{[1
]}

Paull, Liam ^{[1
]}

机构：

[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada

来源：

2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年

关键词：

Vision Transformer; Image Segmentation; Visual Servoing;

D O I：

10.1109/CRV55824.2022.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

引用

页码：197 / 204

页数：8

共 50 条

[31] Reverse Optical Flow for Self-Supervised Adaptive Autonomous Robot Navigation
A. Lookingbill
J. Rogers
D. Lieb
J. Curry
S. Thrun
International Journal of Computer Vision, 2007, 74 : 287 - 302
[32] Reverse optical flow for self-supervised adaptive autonomous robot navigation
Lookingbill, A.
Rogers, J.
Lieb, D.
Curry, J.
Thrun, S.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2007, 74 (03) : 287 - 302
[33] Detail-Preserving Self-Supervised Monocular Depth with Self-Supervised Structural Sharpening
Bello, Juan Luis Gonzalez
Moon, Jaeho
Kim, Munchurl
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 254 - 264
[34] DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer
Kumar, Sonal
Sur, Arijit
Baruah, Rashmi Dutta
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1775 - 1788
[35] Monocular Vision Navigation and Control of Mobile Robot
Yan, Runchen
Wang, Hong
Yang, Yuzhi
Wei, Huanbing
Wang, Yonggang
CONFERENCE ON MODELING, IDENTIFICATION AND CONTROL, 2012, 3 : 707 - 714
[36] Automatic Navigation for A Mobile Robot with Monocular Vision
Zhan, Qiang
Huang, Shouren
Wu, Jia
2008 IEEE CONFERENCE ON ROBOTICS, AUTOMATION, AND MECHATRONICS, VOLS 1 AND 2, 2008, : 531 - 536
[37] ROBOT NAVIGATION USING LANDMARKS AND MONOCULAR VISION
Manoiu-Olaru, S.
Nitulescu, M.
PROCEEDINGS OF 11TH INTERNATIONAL CARPATHIAN CONTROL CONFERENCE, 2010, 2010, : 493 - 496
[38] Digging Into Self-Supervised Monocular Depth Estimation
Godard, Clement
Mac Aodha, Oisin
Firman, Michael
Brostow, Gabriel
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3827 - 3837
[39] Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
Kang, Dahyun
Koniusz, Piotr
Cho, Minsu
Murray, Naila
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19627 - 19638
[40] Self-supervised monocular depth estimation in fog
Tao, Bo
Hu, Jiaxin
Jiang, Du
Li, Gongfa
Chen, Baojia
Qian, Xinbo
OPTICAL ENGINEERING, 2023, 62 (03)

← 1 2 3 4 5 →