Auxiliary Fine-grained Alignment Constraints for Vision-and-Language Navigation

被引:0
|
作者
Cui, Yibo [1 ,2 ]
Huang, Ruqiang [2 ]
Zhang, Yakun [1 ,2 ]
Cen, Yingjie [3 ]
Xie, Liang [1 ,2 ]
Yan, Ye [1 ,2 ]
Yin, Erwei [1 ,2 ]
机构
[1] Acad Mil Sci China, Natl Innovat Inst Def Technol, Beijing 100071, Peoples R China
[2] Tianjin Artificial Intelligence Innovat Ctr TAIIC, Tianjin 300450, Peoples R China
[3] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision-and-Language Navigation; Fine-grained Cross-modal Alignment; Auxiliary Constraint;
D O I
10.1109/ICME55011.2023.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) requires a visual agent to navigate in photo-realistic environments following instructions. Fine-grained cross-modal alignment is one critical challenge in VLN because the agent needs to focus on a particular sub-part within the complete instruction for the next movement. However, previous work failed to implement explicit supervision for matching the sub-trajectory to the corresponding sub-instruction. In this paper, we propose Auxiliary Fine-grained Alignment Constraints (AFAC) to facilitate decision-making learning during navigation. AFAC consists of two constraints, i.e., Attention Alignment Constraint (AAC) and Representation Alignment Constraint (RAC), which produce additional supervising signals from the perspective of attention and representation respectively. We test our method on the Landmark-RxR benchmark and achieve state-of-the-art results both in seen and unseen environments.
引用
收藏
页码:2621 / 2626
页数:6
相关论文
共 50 条
  • [31] ESceme: Vision-and-Language Navigation with Episodic Scene Memory
    Zheng, Qi
    Liu, Daqing
    Wang, Chaoyue
    Zhang, Jing
    Wang, Dadong
    Tao, Dacheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [32] DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
    Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama
    223-8522, Japan
    不详
    305-8560, Japan
    [J]. Sensors, 2025, 25 (02)
  • [33] Airbert: In-domain Pretraining for Vision-and-Language Navigation
    Guhur, Pierre-Louis
    Tapaswi, Makarand
    Chen, Shizhe
    Laptev, Ivan
    Schmid, Cordelia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1614 - 1623
  • [34] GridMM: Grid Memory Map for Vision-and-Language Navigation
    Wang, Zihan
    Li, Xiangyang
    Yang, Jiahao
    Liu, Yeqi
    Jiang, Shuqiang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15579 - 15590
  • [35] KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
    Li, Xiangyang
    Wang, Zihan
    Yang, Jiahao
    Wang, Yaowei
    Jiang, Shuqiang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2583 - 2592
  • [36] Sub-Instruction Aware Vision-and-Language Navigation
    Hong, Yicong
    Rodriguez-Opazo, Cristian
    Wu, Qi
    Gould, Stephen
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3360 - 3376
  • [37] Learning Vision-and-Language Navigation from YouTube Videos
    Lin, Kunyang
    Chen, Peihao
    Huang, Diwei
    Li, Thomas H.
    Tan, Mingkui
    Gan, Chuang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8283 - 8292
  • [38] Fine-grained Semantic Alignment Network forWeakly Supervised Temporal Language Grounding
    Wang, Yuechen
    Zhou, Wengang
    Li, Houqiang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 89 - 99
  • [39] NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
    Zhou, Gengze
    Hong, Yicong
    Wu, Qi
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7641 - 7649
  • [40] Learning fine-grained control for mapless navigation
    de Villiers, Fred
    Brink, Willie
    [J]. 2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 666 - 671