Auxiliary Fine-grained Alignment Constraints for Vision-and-Language Navigation

被引:0
|
作者
Cui, Yibo [1 ,2 ]
Huang, Ruqiang [2 ]
Zhang, Yakun [1 ,2 ]
Cen, Yingjie [3 ]
Xie, Liang [1 ,2 ]
Yan, Ye [1 ,2 ]
Yin, Erwei [1 ,2 ]
机构
[1] Acad Mil Sci China, Natl Innovat Inst Def Technol, Beijing 100071, Peoples R China
[2] Tianjin Artificial Intelligence Innovat Ctr TAIIC, Tianjin 300450, Peoples R China
[3] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision-and-Language Navigation; Fine-grained Cross-modal Alignment; Auxiliary Constraint;
D O I
10.1109/ICME55011.2023.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) requires a visual agent to navigate in photo-realistic environments following instructions. Fine-grained cross-modal alignment is one critical challenge in VLN because the agent needs to focus on a particular sub-part within the complete instruction for the next movement. However, previous work failed to implement explicit supervision for matching the sub-trajectory to the corresponding sub-instruction. In this paper, we propose Auxiliary Fine-grained Alignment Constraints (AFAC) to facilitate decision-making learning during navigation. AFAC consists of two constraints, i.e., Attention Alignment Constraint (AAC) and Representation Alignment Constraint (RAC), which produce additional supervising signals from the perspective of attention and representation respectively. We test our method on the Landmark-RxR benchmark and achieve state-of-the-art results both in seen and unseen environments.
引用
收藏
页码:2621 / 2626
页数:6
相关论文
共 50 条
  • [1] Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
    He, Keji
    Huang, Yan
    Wu, Qi
    Yang, Jianhua
    An, Dong
    Sima, Shuanglin
    Wang, Liang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Measuring Progress in Fine-grained Vision-and-Language Understanding
    Bugliarello, Emanuele
    Sartran, Laurent
    Agrawal, Aishwarya
    Hendricks, Lisa Anne
    Nematzadeh, Aida
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1559 - 1582
  • [3] Vision-and-Language Navigation via Latent Semantic Alignment Learning
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
  • [4] Iterative Vision-and-Language Navigation
    Krantz, Jacob
    Banerjee, Shurjo
    Zhu, Wang
    Corso, Jason
    Anderson, Peter
    Lee, Stefan
    Thomason, Jesse
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
  • [5] Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
    Sun, Yanjun
    Qiu, Yue
    Aoki, Yoshimitsu
    Kataoka, Hirokatsu
    [J]. SENSORS, 2023, 23 (13)
  • [6] On the Evaluation of Vision-and-Language Navigation Instructions
    Zhao, Ming
    Anderson, Peter
    Jain, Vihan
    Wang, Su
    Ku, Alexander
    Baldridge, Jason
    Ie, Eugene
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
  • [7] Recent Advances in Vision-and-language Navigation
    Sima, Shuang-Lin
    Huang, Yan
    He, Ke-Ji
    An, Dong
    Yuan, Hui
    Wang, Liang
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 1 - 14
  • [8] Episodic Transformer for Vision-and-Language Navigation
    Pashevich, Alexander
    Schmid, Cordelia
    Sun, Chen
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
  • [9] Curriculum Learning for Vision-and-Language Navigation
    Zhang, Jiwen
    Wei, Zhongyu
    Fan, Jianqing
    Peng, Jiajie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment
    Zheng, Min
    Wu, Chunpeng
    Qin, Jiaqi
    Liu, Weiwei
    Chen, Ming
    Lin, Long
    Zhou, Fei
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 281 - 293