Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

被引：0

作者：

Gu, Jing ^{[1
]}

Stefani, Eliana ^{[1
]}

Wu, Qi ^{[2
]}

Thomason, Jesse ^{[3
]}

Wang, Xin Eric ^{[1
]}

机构：

[1] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA

[2] Univ Adelaide, Adelaide, SA, Australia

[3] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.(1)

引用

页码：7606 / 7623

页数：18

共 50 条

[31] Speaker-Follower Models for Vision-and-Language Navigation
Fried, Daniel
Hu, Ronghang
Cirik, Volkan
Rohrbach, Anna
Andreas, Jacob
Morency, Louis-Philippe
Berg-Kirkpatrick, Taylor
Saenko, Kate
Klein, Dan
Darrell, Trevor
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[32] ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Zheng, Qi
Liu, Daqing
Wang, Chaoyue
Zhang, Jing
Wang, Dadong
Tao, Dacheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 254 - 274
[33] DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
Sun, Yanjun
Qiu, Yue
Aoki, Yoshimitsu
SENSORS, 2025, 25 (02)
[34] Airbert: In-domain Pretraining for Vision-and-Language Navigation
Guhur, Pierre-Louis
Tapaswi, Makarand
Chen, Shizhe
Laptev, Ivan
Schmid, Cordelia
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1614 - 1623
[35] GridMM: Grid Memory Map for Vision-and-Language Navigation
Wang, Zihan
Li, Xiangyang
Yang, Jiahao
Liu, Yeqi
Jiang, Shuqiang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15579 - 15590
[36] KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
Li, Xiangyang
Wang, Zihan
Yang, Jiahao
Wang, Yaowei
Jiang, Shuqiang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2583 - 2592
[37] Sub-Instruction Aware Vision-and-Language Navigation
Hong, Yicong
Rodriguez-Opazo, Cristian
Wu, Qi
Gould, Stephen
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3360 - 3376
[38] Learning Vision-and-Language Navigation from YouTube Videos
Lin, Kunyang
Chen, Peihao
Huang, Diwei
Li, Thomas H.
Tan, Mingkui
Gan, Chuang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8283 - 8292
[39] Action Inference for Destination Prediction in Vision-and-Language Navigation
Kondapally, Anirudh Reddy
Yamada, Kentaro
Yanaka, Hitomi
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 210 - 217
[40] NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Zhou, Gengze
Hong, Yicong
Wu, Qi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7641 - 7649

← 1 2 3 4 5 →