A Cross-Modal Object-Aware Transformer for Vision-and-Language Navigation

被引：0

作者：

Ni, Han ^{[1
]}

Chen, Jia ^{[1
]}

Zhu, DaYong ^{[1
]}

Shi, Dianxi ^{[1
]}

机构：

[1] Natl Univ Def Technol, Univ Elect Sci & Technol China, Changsha, Peoples R China

来源：

2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2022年

关键词：

vision-and-language navigation; cross-modal object; transformer;

D O I：

10.1109/ICTAI56018.2022.00149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-and-language navigation (VLN) combines cross-modal object references and scene descriptions to provide a breadcrumb trail to a goal location. Whereas existing VLN approaches often do not take full advantage of cross-modal object information, this work proposes a transformer network with perceptual cross-modal object data that fuses and aligns the two cue features of object reference to help agents capture object features. In our method, linguistic object processing provides semantic-level contextual information for visual object features. With this design, our model is able to leverage object features to assist the agent in substantially improving performance on the R2R and R4R benchmarks. Through extensive experiments on R2R and R4R, we demonstrate the effectiveness of the proposed model, and our method improves the absolute 1.6% in SPL on R2R and 2.1% in CLS on R4R. Our analysis shows that the network performs better when focusing on longer heavily object-referenced navigation instructions, which also indicates that our approach is better able to use object features and align them to references in the instructions.

引用

页码：976 / 981

页数：6

共 50 条

[41] ENVEDIT: Environment Editing for Vision-and-Language Navigation
Li, Jialu
Tan, Hao
Bansal, Mohit
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15386 - 15396
[42] Diagnosing the Environment Bias in Vision-and-Language Navigation
Zhang, Yubo
Tan, Hao
Bansal, Mohit
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 890 - 897
[43] Topological Planning with Transformers for Vision-and-Language Navigation
Chen, Kevin
Chen, Junshen K.
Chuang, Jo
Vazquez, Marynel
Savarese, Silvio
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11271 - 11281
[44] Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer
Ilinykh, Nikolai
Dobnik, Simon
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 4062 - 4073
[45] Scaling Data Generation in Vision-and-Language Navigation
Wang, Zun
Li, Jialu
Hong, Yicong
Wang, Yi
Wu, Qi
Bansal, Mohit
Gould, Stephen
Tan, Hao
Qiao, Yu
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11975 - 11986
[46] AerialVLN (sic) : Vision-and-Language Navigation for UAVs
Liu, Shubo
Zhang, Hongsheng
Qi, Yuankai
Wang, Peng
Zhang, Yanning
Wu, Qi
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15338 - 15348
[47] HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Qiao, Yanyuan
Qi, Yuankai
Hong, Yicong
Yu, Zheng
Wang, Peng
Wu, Qi
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15397 - 15406
[48] Depth-Aware Vision-and-Language Navigation using Scene Query Attention Network
Tan, Sinan
Ge, Mengmeng
Guo, Di
Liu, Huaping
Sun, Fuchun
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 9390 - 9396
[49] Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Wang, Xin
Huang, Qiuyuan
Celikyilmaz, Asli
Gao, Jianfeng
Shen, Dinghan
Wang, Yuan-Fang
Wang, William Yang
Zhang, Lei
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3622 - 6631
[50] CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation
Magassouba, Aly
Sugiura, Komei
Kawai, Hisashi
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) : 6258 - 6265

← 1 2 3 4 5 →