Local Slot Attention for Vision-and-Language Navigation

被引：1

作者：

Zhuang, Yifeng ^{[1
]}

Sun, Qiang ^{[1
]}

Fu, Yanwei ^{[2
]}

Chen, Lifeng ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Shanghai, Peoples R China

[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

关键词：

vision-and-language navigation; slot attention; local attention;

D O I：

10.1145/3512527.3531366

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Vision-and-language navigation (VLN), a frontier study aiming to pave the way for general-purpose robots, has been a hot topic in the computer vision and natural language processing community. The VLN task requires an agent to navigate to a goal location following natural language instructions in unfamiliar environments. Recently, transformer-based models have gained significant improvements on the VLN task. Since the attention mechanism in the transformer architecture can better integrate inter- and intra-modal information of vision and language. However, there exist two problems in current transformer-based models. 1) The models process each view independently without taking the integrity of the objects into account. 2) During the self-attention operation in the visual modality, the views that are spatially distant can be inter-weaved with each other without explicit restriction. This kind of mixing may introduce extra noise instead of useful information. To address these issues, we propose 1) A slot-attention based module to incorporate information from segmentation of the same object. 2) A local attention mask mechanism to limit the visual attention span. The proposed modules can be easily plugged into any VLN architecture and we use the Recurrent VLN-Bert as our base model. Experiments on the R2R dataset show that our model has achieved the state-of-the-art results.

引用

页码：545 / 553

页数：9

共 50 条

[1] GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation
Huo, Jingyang
Sun, Qiang
Jiang, Boyan
Lin, Haitao
Fu, Yanwei
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23212 - 23221
[2] Iterative Vision-and-Language Navigation
Krantz, Jacob
Banerjee, Shurjo
Zhu, Wang
Corso, Jason
Anderson, Peter
Lee, Stefan
Thomason, Jesse
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
[3] Multimodal attention networks for low-level vision-and-language navigation
Landi, Federico
Baraldi, Lorenzo
Cornia, Marcella
Corsini, Massimiliano
Cucchiara, Rita
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 210
[4] On the Evaluation of Vision-and-Language Navigation Instructions
Zhao, Ming
Anderson, Peter
Jain, Vihan
Wang, Su
Ku, Alexander
Baldridge, Jason
Ie, Eugene
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
[5] Recent Advances in Vision-and-language Navigation
Sima S.-L.
Huang Y.
He K.-J.
An D.
Yuan H.
Wang L.
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 1 - 14
[6] Curriculum Learning for Vision-and-Language Navigation
Zhang, Jiwen
Wei, Zhongyu
Fan, Jianqing
Peng, Jiajie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Episodic Transformer for Vision-and-Language Navigation
Pashevich, Alexander
Schmid, Cordelia
Sun, Chen
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
[8] WebVLN: Vision-and-Language Navigation on Websites
Chen, Qi
Pitawela, Dileepa
Zhao, Chongyang
Zhou, Gengze
Chen, Hsiang-Ting
Wu, Qi
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1165 - 1173
[9] Improved Speaker and Navigator for Vision-and-Language Navigation
Wu, Zongkai
Liu, Zihan
Wang, Ting
Wang, Donglin
[J]. IEEE MULTIMEDIA, 2021, 28 (04) : 55 - 63
[10] Memory-Adaptive Vision-and-Language Navigation
He, Keji
Jing, Ya
Huang, Yan
Lu, Zhihe
An, Dong
Wang, Liang
[J]. PATTERN RECOGNITION, 2024, 153

← 1 2 3 4 5 →