Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

被引:0
|
作者
Gu, Jing [1 ]
Stefani, Eliana [1 ]
Wu, Qi [2 ]
Thomason, Jesse [3 ]
Wang, Xin Eric [1 ]
机构
[1] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[2] Univ Adelaide, Adelaide, SA, Australia
[3] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.(1)
引用
收藏
页码:7606 / 7623
页数:18
相关论文
共 50 条
  • [41] Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
    Hu, Ronghang
    Fried, Daniel
    Rohrbach, Anna
    Klein, Dan
    Darrell, Trevor
    Saenko, Kate
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6551 - 6557
  • [42] Cluster-based Curriculum Learning for Vision-and-Language Navigation
    Wang, Ting
    Wu, Zongkai
    Liu, Zihan
    Wang, Donglin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [43] Vision-and-Language Navigation via Latent Semantic Alignment Learning
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
  • [44] Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation
    Hwang, Jisu
    Kim, Incheol
    SENSORS, 2021, 21 (03) : 1 - 23
  • [45] FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation
    Zhou, Kaiwen
    Wang, Xin Eric
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 682 - 699
  • [46] Enhancing Scene Understanding for Vision-and-Language Navigation by Knowledge Awareness
    Gao, Fang
    Tang, Jingfeng
    Wang, Jiabao
    Li, Shaodong
    Yu, Jun
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10874 - 10881
  • [47] Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
    Yu, Felix
    Deng, Zhiwei
    Narasimhan, Karthik
    Russakovsky, Olga
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4000 - 4004
  • [48] Tree-Structured Trajectory Encoding for Vision-and-Language Navigation
    Zhou, Xinzhe
    Mu, Yadong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3814 - 3824
  • [49] LLM as Copilot for Coarse-Grained Vision-and-Language Navigation
    Qiao, Yanyuan
    Liu, Qianyi
    Liu, Jiajun
    Liu, Jing
    Wu, Qi
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 459 - 476
  • [50] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
    Hong, Yicong
    Wu, Qi
    Qi, Yuankai
    Rodriguez-Opazo, Cristian
    Gould, Stephen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653