WebVLN: Vision-and-Language Navigation on Websites

被引:0
|
作者
Chen, Qi [1 ]
Pitawela, Dileepa [1 ]
Zhao, Chongyang [1 ]
Zhou, Gengze [1 ]
Chen, Hsiang-Ting [1 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) task aims to enable AI agents to accurately understand and follow natural language instructions to navigate through real-world environments, ultimately reaching specific target locations. We recognise a promising opportunity to extend VLN to a comparable navigation task that holds substantial significance in our daily lives, albeit within the virtual realm: navigating websites on the Internet. This paper proposes a new task named Vision-and-Language Navigation on Websites (WebVLN), where we use question-based instructions to train an agent, emulating how users naturally browse websites. Unlike the existing VLN task that only pays attention to vision and instruction (language), the WebVLN agent further considers underlying web-specific content like HTML, which could not be seen on the rendered web pages yet contains rich visual and textual information. Toward this goal, we contribute a dataset, WebVLN-v1, and introduce a novel approach called Website-aware VLN Network (WebVLN-Net), which is built upon the foundation of state-of-the-art VLN techniques. Experimental results show that WebVLN-Net outperforms current VLN and web-related navigation methods. We believe that the introduction of the new WebVLN task and its dataset will establish a new dimension within the VLN domain and contribute to the broader vision-and-language research community. Code is available at: https://github.com/WebVLN/WebVLN.
引用
收藏
页码:1165 / 1173
页数:9
相关论文
共 50 条
  • [31] Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation
    Yu, Felix
    Deng, Zhiwei
    Narasimhan, Karthik
    Russakovsky, Olga
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4000 - 4004
  • [32] Vision-and-Language Navigation via Latent Semantic Alignment Learning
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
  • [33] Cluster-based Curriculum Learning for Vision-and-Language Navigation
    Wang, Ting
    Wu, Zongkai
    Liu, Zihan
    Wang, Donglin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [34] Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation
    Hwang, Jisu
    Kim, Incheol
    SENSORS, 2021, 21 (03) : 1 - 23
  • [35] FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation
    Zhou, Kaiwen
    Wang, Xin Eric
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 682 - 699
  • [36] Enhancing Scene Understanding for Vision-and-Language Navigation by Knowledge Awareness
    Gao, Fang
    Tang, Jingfeng
    Wang, Jiabao
    Li, Shaodong
    Yu, Jun
    IEEE Robotics and Automation Letters, 2024, 9 (12) : 10874 - 10881
  • [37] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
    Hong, Yicong
    Wu, Qi
    Qi, Yuankai
    Rodriguez-Opazo, Cristian
    Gould, Stephen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653
  • [38] Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
    Zhu, Wanrong
    Wang, Xin Eric
    Fu, Tsu-Jui
    Yan, An
    Narayana, Pradyumna
    Sone, Kazoo
    Basu, Sugato
    Wang, William Yang
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1207 - 1221
  • [39] Survey on the Research Progress and Development Trend of Vision-and-Language Navigation
    Niu K.
    Wang P.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (12): : 1815 - 1827
  • [40] Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
    Gu, Jing
    Stefani, Eliana
    Wu, Qi
    Thomason, Jesse
    Wang, Xin Eric
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7606 - 7623