LLM as Copilot for Coarse-Grained Vision-and-Language Navigation

被引:0
|
作者
Qiao, Yanyuan [1 ]
Liu, Qianyi [2 ,3 ]
Liu, Jiajun [4 ,5 ]
Liu, Jing [2 ,3 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] CSIRO Data61, Eveleigh, Australia
[5] Univ Queensland, Brisbane, Qld, Australia
来源
关键词
Vision-and-Language; Navigation; Large Language; Models;
D O I
10.1007/978-3-031-72652-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) involves guiding an agent through indoor environments using human-provided textual instructions. Coarse-grained VLN, with short and high-level instructions, has gained popularity as it closely mirrors real-world scenarios. However, a significant challenge is these instructions are often too concise for agents to comprehend and act upon. Previous studies have explored allowing agents to seek assistance during navigation, but typically offer rigid support from pre-existing datasets or simulators. The advent of Large Language Models (LLMs) presents a novel avenue for aiding VLN agents. This paper introduces VLN-Copilot, a framework enabling agents to actively seek assistance when encountering confusion, with the LLM serving as a copilot to facilitate navigation. Our approach includes the introduction of a confusion score, quantifying the level of uncertainty in an agent's action decisions, while the LLM offers real-time detailed guidance for navigation. Experimental results on two coarse-grained VLN datasets show the efficacy of our method.
引用
收藏
页码:459 / 476
页数:18
相关论文
共 50 条
  • [41] Cluster-based Curriculum Learning for Vision-and-Language Navigation
    Wang, Ting
    Wu, Zongkai
    Liu, Zihan
    Wang, Donglin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [42] Vision-and-Language Navigation via Latent Semantic Alignment Learning
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
  • [43] Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation
    Hwang, Jisu
    Kim, Incheol
    SENSORS, 2021, 21 (03) : 1 - 23
  • [44] FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation
    Zhou, Kaiwen
    Wang, Xin Eric
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 682 - 699
  • [45] Enhancing Scene Understanding for Vision-and-Language Navigation by Knowledge Awareness
    Gao, Fang
    Tang, Jingfeng
    Wang, Jiabao
    Li, Shaodong
    Yu, Jun
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10874 - 10881
  • [46] Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
    Yu, Felix
    Deng, Zhiwei
    Narasimhan, Karthik
    Russakovsky, Olga
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4000 - 4004
  • [47] Tree-Structured Trajectory Encoding for Vision-and-Language Navigation
    Zhou, Xinzhe
    Mu, Yadong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3814 - 3824
  • [48] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
    Hong, Yicong
    Wu, Qi
    Qi, Yuankai
    Rodriguez-Opazo, Cristian
    Gould, Stephen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653
  • [49] GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation
    Jain, Aman
    Misu, Teruhisa
    Yamada, Kentaro
    Yanaka, Hitomi
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: STUDENT RESEARCH WORKSHOP, 2024, : 290 - 295
  • [50] Survey on the Research Progress and Development Trend of Vision-and-Language Navigation
    Niu K.
    Wang P.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (12): : 1815 - 1827