GridMM: Grid Memory Map for Vision-and-Language Navigation

被引：4

作者：

Wang, Zihan ^{[1
,2
]}

Li, Xiangyang ^{[1
,2
]}

Yang, Jiahao ^{[1
,2
]}

Liu, Yeqi ^{[1
,2
]}

Jiang, Shuqiang ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc Lab, Inst Comp Technol, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.01432

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. To represent the previously visited environment, most approaches for VLN implement memory using recurrent states, topological maps, or top-down semantic maps. In contrast to these approaches, we build the top-down egocentric and dynamically growing Grid Memory Map (i.e., GridMM) to structure the visited environment. From a global perspective, historical observations are projected into a unified grid map in a top-down view, which can better represent the spatial relations of the environment. From a local perspective, we further propose an instruction relevance aggregation method to capture fine-grained visual clues in each grid region. Extensive experiments are conducted on both the REVERIE, R2R, SOON datasets in the discrete environments, and the R2R-CE dataset in the continuous environments, showing the superiority of our proposed method. The source code is available at https://github.com/MrZihan/GridMM.

引用

页码：15579 / 15590

页数：12

共 50 条

[1] Memory-Adaptive Vision-and-Language Navigation
He, Keji
Jing, Ya
Huang, Yan
Lu, Zhihe
An, Dong
Wang, Liang
[J]. PATTERN RECOGNITION, 2024, 153
[2] ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Zheng, Qi
Liu, Daqing
Wang, Chaoyue
Zhang, Jing
Wang, Dadong
Tao, Dacheng
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
[3] Iterative Vision-and-Language Navigation
Krantz, Jacob
Banerjee, Shurjo
Zhu, Wang
Corso, Jason
Anderson, Peter
Lee, Stefan
Thomason, Jesse
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
[4] On the Evaluation of Vision-and-Language Navigation Instructions
Zhao, Ming
Anderson, Peter
Jain, Vihan
Wang, Su
Ku, Alexander
Baldridge, Jason
Ie, Eugene
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
[5] Recent Advances in Vision-and-language Navigation
Sima S.-L.
Huang Y.
He K.-J.
An D.
Yuan H.
Wang L.
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 1 - 14
[6] Curriculum Learning for Vision-and-Language Navigation
Zhang, Jiwen
Wei, Zhongyu
Fan, Jianqing
Peng, Jiajie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Episodic Transformer for Vision-and-Language Navigation
Pashevich, Alexander
Schmid, Cordelia
Sun, Chen
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
[8] WebVLN: Vision-and-Language Navigation on Websites
Chen, Qi
Pitawela, Dileepa
Zhao, Chongyang
Zhou, Gengze
Chen, Hsiang-Ting
Wu, Qi
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1165 - 1173
[9] Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation
Lin, Chuang
Jiang, Yi
Cai, Jianfei
Qu, Lizhen
Haffari, Gholamreza
Yuan, Zehuan
[J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 380 - 397
[10] Local Slot Attention for Vision-and-Language Navigation
Zhuang, Yifeng
Sun, Qiang
Fu, Yanwei
Chen, Lifeng
Xue, Xiangyang
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 545 - 553

← 1 2 3 4 5 →