HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

被引:24
|
作者
Qiao, Yanyuan [1 ]
Qi, Yuankai [1 ]
Hong, Yicong [2 ]
Yu, Zheng [1 ]
Wang, Peng [3 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Adelaide, SA, Australia
[2] Australian Natl Univ, Canberra, ACT, Australia
[3] Northwestern Polytech Univ, Xian, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.01498
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN). However, previous pre-training methods for VLN either lack the ability to predict future actions or ignore the trajectory contexts, which are essential for a greedy navigation process. In this work, to promote the learning of spatio-temporal visual-textual correspondence as well as the agent's capability of decision making, we propose a novel history-and-order aware pre-training paradigm (HOP) with VLN-specific objectives that exploit the past observations and support future action prediction. Specifically, in addition to the commonly used Masked Language Modeling (MLM) and Trajectory-Instruction Matching (TIM), we design two proxy tasks to model temporal order information: Trajectory Order Modeling (TOM) and Group Order Modeling (GOM). Moreover, our navigation action prediction is also enhanced by introducing the task of Action Prediction with History (APH), which takes into account the history visual perceptions. Extensive experimental results on four downstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed method compared against several state-of-the-art agents.
引用
收藏
页码:15397 / 15406
页数:10
相关论文
共 50 条
  • [21] On the Evaluation of Vision-and-Language Navigation Instructions
    Zhao, Ming
    Anderson, Peter
    Jain, Vihan
    Wang, Su
    Ku, Alexander
    Baldridge, Jason
    Ie, Eugene
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
  • [22] Recent Advances in Vision-and-language Navigation
    Sima S.-L.
    Huang Y.
    He K.-J.
    An D.
    Yuan H.
    Wang L.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 1 - 14
  • [23] Curriculum Learning for Vision-and-Language Navigation
    Zhang, Jiwen
    Wei, Zhongyu
    Fan, Jianqing
    Peng, Jiajie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Episodic Transformer for Vision-and-Language Navigation
    Pashevich, Alexander
    Schmid, Cordelia
    Sun, Chen
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
  • [25] WebVLN: Vision-and-Language Navigation on Websites
    Chen, Qi
    Pitawela, Dileepa
    Zhao, Chongyang
    Zhou, Gengze
    Chen, Hsiang-Ting
    Wu, Qi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1165 - 1173
  • [26] UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
    Zhou, Mingyang
    Zhou, Luowei
    Wang, Shuohang
    Cheng, Yu
    Li, Linjie
    Yu, Zhou
    Liu, Jingjing
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4153 - 4163
  • [27] DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
    Huang, Luyang
    Niu, Guocheng
    Liu, Jiachen
    Xiao, Xinyan
    Wu, Hua
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2552 - 2566
  • [28] A Cross-Modal Object-Aware Transformer for Vision-and-Language Navigation
    Ni, Han
    Chen, Jia
    Zhu, DaYong
    Shi, Dianxi
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 976 - 981
  • [29] SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
    Moudgil, Abhinav
    Majumdar, Arjun
    Agrawal, Harsh
    Lee, Stefan
    Batra, Dhruv
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [30] Continual pre-training mitigates forgetting in language and vision
    Cossu A.
    Carta A.
    Passaro L.
    Lomonaco V.
    Tuytelaars T.
    Bacciu D.
    Neural Networks, 2024, 179