Vision-Language Navigation With Beam-Constrained Global Normalization

被引:4
|
作者
Xie, Liang [1 ,2 ]
Zhang, Meishan [3 ]
Li, You [4 ]
Qin, Wei [1 ,2 ]
Yan, Ye [1 ,2 ]
Yin, Erwei [1 ,2 ]
机构
[1] Acad Mil Sci China, Natl Innovat Inst Def Technol, Beijing 100071, Peoples R China
[2] Tianjin Artificial Intelligence Innovat Ctr TAI, Tianjin 300450, Peoples R China
[3] Harbin Inst Technol Shenzhen, Inst Comp & Intelligence, Shenzhen 518055, Peoples R China
[4] China Astronaut Res & Training Ctr, Natl Key Lab Human Factors Engn, Beijing 100094, Peoples R China
基金
中国国家自然科学基金;
关键词
Trajectory; Navigation; Visualization; Task analysis; Training; Natural languages; Decoding; Beam search; global normalization; sequence to sequence; vision-language navigation (VLN);
D O I
10.1109/TNNLS.2022.3183287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language navigation (VLN) is a challenging task, which guides an agent to navigate in a realistic environment by natural language instructions. Sequence-to-sequence modeling is one of the most prospective architectures for the task, which achieves the agent navigation goal by a sequence of moving actions. The line of work has led to the state-of-the-art performance. Recently, several studies showed that the beam-search decoding during the inference can result in promising performance, as it ranks multiple candidate trajectories by scoring each trajectory as a whole. However, the trajectory-level score might be seriously biased during ranking. The score is a simple averaging of individual unit scores of the target-sequence actions, and these unit scores could be incomparable among different trajectories since they are calculated by a local discriminant classifier. To address this problem, we propose a global normalization strategy to rescale the scores at the trajectory level. Concretely, we present two global score functions to rerank all candidates in the output beam, resulting in more comparable trajectory scores. In this way, the bias problem can be greatly alleviated. We conduct experiments on the benchmark room-to-room (R2R) dataset of VLN to verify our method, and the results show that the proposed global method is effective, providing significant performance than the corresponding baselines. Our final model can achieve competitive performance on the VLN leaderboard.
引用
收藏
页码:1352 / 1363
页数:12
相关论文
共 50 条
  • [1] Vision-language navigation: a survey and taxonomy
    Wansen Wu
    Tao Chang
    Xinmeng Li
    Quanjun Yin
    Yue Hu
    [J]. Neural Computing and Applications, 2024, 36 : 3291 - 3316
  • [2] Vision-language navigation: a survey and taxonomy
    Wu, Wansen
    Chang, Tao
    Li, Xinmeng
    Yin, Quanjun
    Hu, Yue
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (07): : 3291 - 3316
  • [3] Vision-Language Navigation Policy Learning and Adaptation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4205 - 4216
  • [4] Vision-Language Navigation with Random Environmental Mixup
    Liu, Chong
    Zhu, Fengda
    Chang, Xiaojun
    Liang, Xiaodan
    Ge, Zongyuan
    Shen, Yi-Dong
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1624 - 1634
  • [5] Structured Scene Memory for Vision-Language Navigation
    Wang, Hanqing
    Wang, Wenguan
    Liang, Wei
    Xiong, Caiming
    Shen, Jianbing
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8451 - 8460
  • [6] Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
    Cheng, Wenhao
    Dong, Xingping
    Khan, Salman
    Shen, Jianbing
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 309 - 329
  • [7] Vision-Language Navigation Algorithm Based on Cosine Similarity
    Jin Jie
    Liu Kaiyan
    Zha Shunkao
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (16)
  • [8] DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation
    Wang, Hanqing
    Liang, Wei
    Van Gool, Luc
    Wang, Wenguan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10839 - 10849
  • [9] Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
    Lin, Bingqian
    Zhu, Yi
    Long, Yanxin
    Liang, Xiaodan
    Ye, Qixiang
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7175 - 7189
  • [10] A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
    Burns, Andrea
    Arsan, Deniz
    Agrawal, Sanjna
    Kumar, Ranjitha
    Saenko, Kate
    Plummer, Bryan A.
    [J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 312 - 328