Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引：0

作者：

Saavedra-Ruiz, Miguel ^{[1
]}

Morin, Sacha ^{[1
]}

Paull, Liam ^{[1
]}

机构：

[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada

来源：

2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年

关键词：

Vision Transformer; Image Segmentation; Visual Servoing;

D O I：

10.1109/CRV55824.2022.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

引用

页码：197 / 204

页数：8

共 50 条

[21] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
Mo, Shentong
Sun, Zhun
Li, Chao
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
[22] Patch-level Representation Learning for Self-supervised Vision Transformers
Yun, Sukmin
Lee, Hankook
Kim, Jaehyung
Shin, Jinwoo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
[23] Self-Supervised Monocular Depth Hints
Watson, Jamie
Firman, Michael
Brostow, Gabriel J.
Turmukhambetov, Daniyar
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2162 - 2171
[24] Self-Supervised Monocular Depth Underwater
Amitai, Shlomi
Klein, Itzik
Treibitz, Tali
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1098 - 1104
[25] Persistent self-supervised learning: From stereo to monocular vision for obstacle avoidance
van Hecke, Kevin
de Croon, Guido
van der Maaten, Laurens
Hennes, Daniel
Izzo, Dario
INTERNATIONAL JOURNAL OF MICRO AIR VEHICLES, 2018, 10 (02) : 186 - 206
[26] Self-Supervised Transformers for fMRI representation
Malkiel, Itzik
Rosenman, Gony
Wolf, Lior
Hendler, Talma
INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 895 - 913
[27] On Separate Normalization in Self-supervised Transformers
Chen, Xiaohui
Wang, Yinkai
Du, Yuanqi
Hassoun, Soha
Liu, Li-Ping
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[28] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
Pincic, Domagoj
Susanj, Diego
Lenac, Kristijan
SENSORS, 2022, 22 (19)
[29] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
Chen, Richard J.
Chen, Chengkuan
Li, Yicong
Chen, Tiffany Y.
Trister, Andrew D.
Krishnan, Rahul G.
Mahmood, Faisal
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
[30] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
Wang, Yi
Albrecht, Conrad M.
Zhu, Xiao Xiang
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142

← 1 2 3 4 5 →