LLM as Copilot for Coarse-Grained Vision-and-Language Navigation

被引:0
|
作者
Qiao, Yanyuan [1 ]
Liu, Qianyi [2 ,3 ]
Liu, Jiajun [4 ,5 ]
Liu, Jing [2 ,3 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] CSIRO Data61, Eveleigh, Australia
[5] Univ Queensland, Brisbane, Qld, Australia
来源
关键词
Vision-and-Language; Navigation; Large Language; Models;
D O I
10.1007/978-3-031-72652-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) involves guiding an agent through indoor environments using human-provided textual instructions. Coarse-grained VLN, with short and high-level instructions, has gained popularity as it closely mirrors real-world scenarios. However, a significant challenge is these instructions are often too concise for agents to comprehend and act upon. Previous studies have explored allowing agents to seek assistance during navigation, but typically offer rigid support from pre-existing datasets or simulators. The advent of Large Language Models (LLMs) presents a novel avenue for aiding VLN agents. This paper introduces VLN-Copilot, a framework enabling agents to actively seek assistance when encountering confusion, with the LLM serving as a copilot to facilitate navigation. Our approach includes the introduction of a confusion score, quantifying the level of uncertainty in an agent's action decisions, while the LLM offers real-time detailed guidance for navigation. Experimental results on two coarse-grained VLN datasets show the efficacy of our method.
引用
收藏
页码:459 / 476
页数:18
相关论文
共 50 条
  • [1] Iterative Vision-and-Language Navigation
    Krantz, Jacob
    Banerjee, Shurjo
    Zhu, Wang
    Corso, Jason
    Anderson, Peter
    Lee, Stefan
    Thomason, Jesse
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
  • [2] Auxiliary Fine-grained Alignment Constraints for Vision-and-Language Navigation
    Cui, Yibo
    Huang, Ruqiang
    Zhang, Yakun
    Cen, Yingjie
    Xie, Liang
    Yan, Ye
    Yin, Erwei
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2621 - 2626
  • [3] Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM
    Wang, Jiawei
    Wang, Teng
    Cai, Wenzhe
    Xu, Lele
    Sun, Changyin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 612 - 619
  • [4] Recent Advances in Vision-and-language Navigation
    Sima S.-L.
    Huang Y.
    He K.-J.
    An D.
    Yuan H.
    Wang L.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 1 - 14
  • [5] Curriculum Learning for Vision-and-Language Navigation
    Zhang, Jiwen
    Wei, Zhongyu
    Fan, Jianqing
    Peng, Jiajie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] On the Evaluation of Vision-and-Language Navigation Instructions
    Zhao, Ming
    Anderson, Peter
    Jain, Vihan
    Wang, Su
    Ku, Alexander
    Baldridge, Jason
    Ie, Eugene
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
  • [7] Episodic Transformer for Vision-and-Language Navigation
    Pashevich, Alexander
    Schmid, Cordelia
    Sun, Chen
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
  • [8] WebVLN: Vision-and-Language Navigation on Websites
    Chen, Qi
    Pitawela, Dileepa
    Zhao, Chongyang
    Zhou, Gengze
    Chen, Hsiang-Ting
    Wu, Qi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1165 - 1173
  • [9] Masked Path Modeling for Vision-and-Language Navigation
    Dou, Zi-Yi
    Gao, Feng
    Peng, Nanyun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15255 - 15269
  • [10] Memory-Adaptive Vision-and-Language Navigation
    He, Keji
    Jing, Ya
    Huang, Yan
    Lu, Zhihe
    An, Dong
    Wang, Liang
    PATTERN RECOGNITION, 2024, 153