A Cross-Modal Object-Aware Transformer for Vision-and-Language Navigation

被引：0

作者：

Ni, Han ^{[1
]}

Chen, Jia ^{[1
]}

Zhu, DaYong ^{[1
]}

Shi, Dianxi ^{[1
]}

机构：

[1] Natl Univ Def Technol, Univ Elect Sci & Technol China, Changsha, Peoples R China

来源：

2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2022年

关键词：

vision-and-language navigation; cross-modal object; transformer;

D O I：

10.1109/ICTAI56018.2022.00149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-and-language navigation (VLN) combines cross-modal object references and scene descriptions to provide a breadcrumb trail to a goal location. Whereas existing VLN approaches often do not take full advantage of cross-modal object information, this work proposes a transformer network with perceptual cross-modal object data that fuses and aligns the two cue features of object reference to help agents capture object features. In our method, linguistic object processing provides semantic-level contextual information for visual object features. With this design, our model is able to leverage object features to assist the agent in substantially improving performance on the R2R and R4R benchmarks. Through extensive experiments on R2R and R4R, we demonstrate the effectiveness of the proposed model, and our method improves the absolute 1.6% in SPL on R2R and 2.1% in CLS on R4R. Our analysis shows that the network performs better when focusing on longer heavily object-referenced navigation instructions, which also indicates that our approach is better able to use object features and align them to references in the instructions.

引用

页码：976 / 981

页数：6

共 50 条

[1] SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Moudgil, Abhinav
Majumdar, Arjun
Agrawal, Harsh
Lee, Stefan
Batra, Dhruv
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[2] Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Irshad, Muhammad Zubair
Ma, Chih-Yao
Kira, Zsolt
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13238 - 13246
[3] History Aware Multimodal Transformer for Vision-and-Language Navigation
Chen, Shizhe
Guhur, Pierre-Louis
Schmid, Cordelia
Laptev, Ivan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation
Wu, Siying
Fu, Xueyang
Wu, Feng
Zha, Zheng-Jun
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4233 - 4241
[5] Vision-and-Language Navigation Based on Cross-Modal Feature Fusion in Indoor Environment
Wen, Shuhuan
Lv, Xiaohan
Yu, F. Richard
Gong, Simeng
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 3 - 15
[6] Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Frank, Stella
Bugliarello, Emanuele
Elliott, Desmond
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9847 - 9857
[7] Episodic Transformer for Vision-and-Language Navigation
Pashevich, Alexander
Schmid, Cordelia
Sun, Chen
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
[8] Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Ramshetty, Shivaen
Verma, Gaurav
Kumar, Srijan
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15974 - 15990
[9] Cross-modal Map Learning for Vision and Language Navigation
Georgakis, Georgios
Schmeckpeper, Karl
Wanchoo, Karan
Dan, Soham
Miltsakaki, Eleni
Roth, Dan
Daniilidis, Kostas
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15439 - 15449
[10] Transformer-Exclusive Cross-Modal Representation for Vision and Language
Shin, Andrew
Narihira, Takuya
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2719 - 2725

← 1 2 3 4 5 →