LLM as Copilot for Coarse-Grained Vision-and-Language Navigation

被引：0

作者：

Qiao, Yanyuan ^{[1
]}

Liu, Qianyi ^{[2
,3
]}

Liu, Jiajun ^{[4
,5
]}

Liu, Jing ^{[2
,3
]}

Wu, Qi ^{[1
]}

机构：

[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[4] CSIRO Data61, Eveleigh, Australia

[5] Univ Queensland, Brisbane, Qld, Australia

来源：

COMPUTER VISION - ECCV 2024, PT V | 2025年 / 15063卷

关键词：

Vision-and-Language; Navigation; Large Language; Models;

D O I：

10.1007/978-3-031-72652-1_27

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-and-Language Navigation (VLN) involves guiding an agent through indoor environments using human-provided textual instructions. Coarse-grained VLN, with short and high-level instructions, has gained popularity as it closely mirrors real-world scenarios. However, a significant challenge is these instructions are often too concise for agents to comprehend and act upon. Previous studies have explored allowing agents to seek assistance during navigation, but typically offer rigid support from pre-existing datasets or simulators. The advent of Large Language Models (LLMs) presents a novel avenue for aiding VLN agents. This paper introduces VLN-Copilot, a framework enabling agents to actively seek assistance when encountering confusion, with the LLM serving as a copilot to facilitate navigation. Our approach includes the introduction of a confusion score, quantifying the level of uncertainty in an agent's action decisions, while the LLM offers real-time detailed guidance for navigation. Experimental results on two coarse-grained VLN datasets show the efficacy of our method.

引用

页码：459 / 476

页数：18

共 50 条

[41] Cluster-based Curriculum Learning for Vision-and-Language Navigation
Wang, Ting
Wu, Zongkai
Liu, Zihan
Wang, Donglin
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[42] Vision-and-Language Navigation via Latent Semantic Alignment Learning
Wu, Siying
Fu, Xueyang
Wu, Feng
Zha, Zheng-Jun
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
[43] Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation
Hwang, Jisu
Kim, Incheol
SENSORS, 2021, 21 (03) : 1 - 23
[44] FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation
Zhou, Kaiwen
Wang, Xin Eric
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 682 - 699
[45] Enhancing Scene Understanding for Vision-and-Language Navigation by Knowledge Awareness
Gao, Fang
Tang, Jingfeng
Wang, Jiabao
Li, Shaodong
Yu, Jun
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10874 - 10881
[46] Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
Yu, Felix
Deng, Zhiwei
Narasimhan, Karthik
Russakovsky, Olga
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4000 - 4004
[47] Tree-Structured Trajectory Encoding for Vision-and-Language Navigation
Zhou, Xinzhe
Mu, Yadong
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3814 - 3824
[48] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
Hong, Yicong
Wu, Qi
Qi, Yuankai
Rodriguez-Opazo, Cristian
Gould, Stephen
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653
[49] GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation
Jain, Aman
Misu, Teruhisa
Yamada, Kentaro
Yanaka, Hitomi
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: STUDENT RESEARCH WORKSHOP, 2024, : 290 - 295
[50] Survey on the Research Progress and Development Trend of Vision-and-Language Navigation
Niu K.
Wang P.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (12): : 1815 - 1827

← 1 2 3 4 5 →