Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

被引:0
|
作者
Gu, Jing [1 ]
Stefani, Eliana [1 ]
Wu, Qi [2 ]
Thomason, Jesse [3 ]
Wang, Xin Eric [1 ]
机构
[1] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[2] Univ Adelaide, Adelaide, SA, Australia
[3] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.(1)
引用
收藏
页码:7606 / 7623
页数:18
相关论文
共 50 条
  • [21] AerialVLN (sic) : Vision-and-Language Navigation for UAVs
    Liu, Shubo
    Zhang, Hongsheng
    Qi, Yuankai
    Wang, Peng
    Zhang, Yanning
    Wu, Qi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15338 - 15348
  • [22] Vision-and-Language Navigation via Causal Learning
    Wang, Liuyi
    He, Zongtao
    Dang, Ronghao
    Shen, Mengjiao
    Liu, Chengju
    Chen, Qijun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13139 - 13150
  • [23] CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
    Srinivasan, Tejas
    Chang, Ting-Yun
    Alva, Leticia Pinto
    Chochlakis, Georgios
    Rostami, Mohammad
    Thomason, Jesse
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] Unifying Vision-and-Language Tasks via Text Generation
    Cho, Jaemin
    Lei, Jie
    Tan, Hao
    Bansal, Mohit
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [25] Reinforced Vision-and-Language Navigation Based on Historical BERT
    Zhang, Zixuan
    Qi, Shuhan
    Zhou, Zihao
    Zhang, Jiajia
    Yuan, Hao
    Wang, Xuan
    Wang, Lei
    Xiao, Jing
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 427 - 438
  • [26] History Aware Multimodal Transformer for Vision-and-Language Navigation
    Chen, Shizhe
    Guhur, Pierre-Louis
    Schmid, Cordelia
    Laptev, Ivan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [27] Diagnosing Vision-and-Language Navigation: What Really Matters
    Zhu, Wanrong
    Qi, Yuankai
    Narayana, Pradyumna
    Sone, Kazoo
    Basu, Sugato
    Wang, Eric Xin
    Wu, Qi
    Eckstein, Miguel
    Wang, William Yang
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5981 - 5993
  • [28] Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing
    Chen, Jingwen
    Luo, Jianjie
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [29] Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
    Xu, Ming
    Xie, Zilong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10756 - 10763
  • [30] Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
    Jain, Vihan
    Magalhaes, Gabriel
    Ku, Alexander
    Vaswani, Ashish
    Ie, Eugene
    Baldridge, Jason
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1862 - 1872