Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

被引：0

作者：

Gu, Jing ^{[1
]}

Stefani, Eliana ^{[1
]}

Wu, Qi ^{[2
]}

Thomason, Jesse ^{[3
]}

Wang, Xin Eric ^{[1
]}

机构：

[1] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA

[2] Univ Adelaide, Adelaide, SA, Australia

[3] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.(1)

引用

页码：7606 / 7623

页数：18

共 50 条

[41] Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
Hu, Ronghang
Fried, Daniel
Rohrbach, Anna
Klein, Dan
Darrell, Trevor
Saenko, Kate
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6551 - 6557
[42] Cluster-based Curriculum Learning for Vision-and-Language Navigation
Wang, Ting
Wu, Zongkai
Liu, Zihan
Wang, Donglin
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[43] Vision-and-Language Navigation via Latent Semantic Alignment Learning
Wu, Siying
Fu, Xueyang
Wu, Feng
Zha, Zheng-Jun
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
[44] Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation
Hwang, Jisu
Kim, Incheol
SENSORS, 2021, 21 (03) : 1 - 23
[45] FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation
Zhou, Kaiwen
Wang, Xin Eric
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 682 - 699
[46] Enhancing Scene Understanding for Vision-and-Language Navigation by Knowledge Awareness
Gao, Fang
Tang, Jingfeng
Wang, Jiabao
Li, Shaodong
Yu, Jun
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10874 - 10881
[47] Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
Yu, Felix
Deng, Zhiwei
Narasimhan, Karthik
Russakovsky, Olga
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4000 - 4004
[48] Tree-Structured Trajectory Encoding for Vision-and-Language Navigation
Zhou, Xinzhe
Mu, Yadong
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3814 - 3824
[49] LLM as Copilot for Coarse-Grained Vision-and-Language Navigation
Qiao, Yanyuan
Liu, Qianyi
Liu, Jiajun
Liu, Jing
Wu, Qi
COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 459 - 476
[50] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
Hong, Yicong
Wu, Qi
Qi, Yuankai
Rodriguez-Opazo, Cristian
Gould, Stephen
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653

← 1 2 3 4 5 →