WebVLN: Vision-and-Language Navigation on Websites

被引：0

作者：

Chen, Qi ^{[1
]}

Pitawela, Dileepa ^{[1
]}

Zhao, Chongyang ^{[1
]}

Zhou, Gengze ^{[1
]}

Chen, Hsiang-Ting ^{[1
]}

Wu, Qi ^{[1
]}

机构：

[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-and-Language Navigation (VLN) task aims to enable AI agents to accurately understand and follow natural language instructions to navigate through real-world environments, ultimately reaching specific target locations. We recognise a promising opportunity to extend VLN to a comparable navigation task that holds substantial significance in our daily lives, albeit within the virtual realm: navigating websites on the Internet. This paper proposes a new task named Vision-and-Language Navigation on Websites (WebVLN), where we use question-based instructions to train an agent, emulating how users naturally browse websites. Unlike the existing VLN task that only pays attention to vision and instruction (language), the WebVLN agent further considers underlying web-specific content like HTML, which could not be seen on the rendered web pages yet contains rich visual and textual information. Toward this goal, we contribute a dataset, WebVLN-v1, and introduce a novel approach called Website-aware VLN Network (WebVLN-Net), which is built upon the foundation of state-of-the-art VLN techniques. Experimental results show that WebVLN-Net outperforms current VLN and web-related navigation methods. We believe that the introduction of the new WebVLN task and its dataset will establish a new dimension within the VLN domain and contribute to the broader vision-and-language research community. Code is available at: https://github.com/WebVLN/WebVLN.

引用

下载

页码：1165 / 1173

页数：9

共 50 条

[1] Iterative Vision-and-Language Navigation
Krantz, Jacob
Banerjee, Shurjo
Zhu, Wang
Corso, Jason
Anderson, Peter
Lee, Stefan
Thomason, Jesse
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
[2] On the Evaluation of Vision-and-Language Navigation Instructions
Zhao, Ming
Anderson, Peter
Jain, Vihan
Wang, Su
Ku, Alexander
Baldridge, Jason
Ie, Eugene
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
[3] Recent Advances in Vision-and-language Navigation
Sima S.-L.
Huang Y.
He K.-J.
An D.
Yuan H.
Wang L.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 1 - 14
[4] Curriculum Learning for Vision-and-Language Navigation
Zhang, Jiwen
Wei, Zhongyu
Fan, Jianqing
Peng, Jiajie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Episodic Transformer for Vision-and-Language Navigation
Pashevich, Alexander
Schmid, Cordelia
Sun, Chen
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
[6] Improved Speaker and Navigator for Vision-and-Language Navigation
Wu, Zongkai
Liu, Zihan
Wang, Ting
Wang, Donglin
IEEE MULTIMEDIA, 2021, 28 (04) : 55 - 63
[7] Local Slot Attention for Vision-and-Language Navigation
Zhuang, Yifeng
Sun, Qiang
Fu, Yanwei
Chen, Lifeng
Xue, Xiangyang
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 545 - 553
[8] Memory-Adaptive Vision-and-Language Navigation
He, Keji
Jing, Ya
Huang, Yan
Lu, Zhihe
An, Dong
Wang, Liang
PATTERN RECOGNITION, 2024, 153
[9] Vital information matching in vision-and-language navigation
Jia, Zixi
Yu, Kai
Ru, Jingyu
Yang, Sikai
Coleman, Sonya
FRONTIERS IN NEUROROBOTICS, 2022, 16
[10] Behavioral Analysis of Vision-and-Language Navigation Agents
Yang, Zijiao
Majumdar, Arjun
Lee, Stefan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2574 - 2582

← 1 2 3 4 5 →