LLM as Copilot for Coarse-Grained Vision-and-Language Navigation

被引:0
|
作者
Qiao, Yanyuan [1 ]
Liu, Qianyi [2 ,3 ]
Liu, Jiajun [4 ,5 ]
Liu, Jing [2 ,3 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] CSIRO Data61, Eveleigh, Australia
[5] Univ Queensland, Brisbane, Qld, Australia
来源
关键词
Vision-and-Language; Navigation; Large Language; Models;
D O I
10.1007/978-3-031-72652-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) involves guiding an agent through indoor environments using human-provided textual instructions. Coarse-grained VLN, with short and high-level instructions, has gained popularity as it closely mirrors real-world scenarios. However, a significant challenge is these instructions are often too concise for agents to comprehend and act upon. Previous studies have explored allowing agents to seek assistance during navigation, but typically offer rigid support from pre-existing datasets or simulators. The advent of Large Language Models (LLMs) presents a novel avenue for aiding VLN agents. This paper introduces VLN-Copilot, a framework enabling agents to actively seek assistance when encountering confusion, with the LLM serving as a copilot to facilitate navigation. Our approach includes the introduction of a confusion score, quantifying the level of uncertainty in an agent's action decisions, while the LLM offers real-time detailed guidance for navigation. Experimental results on two coarse-grained VLN datasets show the efficacy of our method.
引用
收藏
页码:459 / 476
页数:18
相关论文
共 50 条
  • [21] Vision-and-Language Navigation via Causal Learning
    Wang, Liuyi
    He, Zongtao
    Dang, Ronghao
    Shen, Mengjiao
    Liu, Chengju
    Chen, Qijun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13139 - 13150
  • [22] Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
    He, Keji
    Huang, Yan
    Wu, Qi
    Yang, Jianhua
    An, Dong
    Sima, Shuanglin
    Wang, Liang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Reinforced Vision-and-Language Navigation Based on Historical BERT
    Zhang, Zixuan
    Qi, Shuhan
    Zhou, Zihao
    Zhang, Jiajia
    Yuan, Hao
    Wang, Xuan
    Wang, Lei
    Xiao, Jing
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 427 - 438
  • [24] History Aware Multimodal Transformer for Vision-and-Language Navigation
    Chen, Shizhe
    Guhur, Pierre-Louis
    Schmid, Cordelia
    Laptev, Ivan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [25] Diagnosing Vision-and-Language Navigation: What Really Matters
    Zhu, Wanrong
    Qi, Yuankai
    Narayana, Pradyumna
    Sone, Kazoo
    Basu, Sugato
    Wang, Eric Xin
    Wu, Qi
    Eckstein, Miguel
    Wang, William Yang
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5981 - 5993
  • [26] Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing
    Chen, Jingwen
    Luo, Jianjie
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [27] Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
    Xu, Ming
    Xie, Zilong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10756 - 10763
  • [28] Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
    Jain, Vihan
    Magalhaes, Gabriel
    Ku, Alexander
    Vaswani, Ashish
    Ie, Eugene
    Baldridge, Jason
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1862 - 1872
  • [29] Speaker-Follower Models for Vision-and-Language Navigation
    Fried, Daniel
    Hu, Ronghang
    Cirik, Volkan
    Rohrbach, Anna
    Andreas, Jacob
    Morency, Louis-Philippe
    Berg-Kirkpatrick, Taylor
    Saenko, Kate
    Klein, Dan
    Darrell, Trevor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] ESceme: Vision-and-Language Navigation with Episodic Scene Memory
    Zheng, Qi
    Liu, Daqing
    Wang, Chaoyue
    Zhang, Jing
    Wang, Dadong
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 254 - 274