Knowledge distilled pre-training model for vision-language-navigation

被引:0
|
作者
Bo Huang
Shuai Zhang
Jitao Huang
Yijun Yu
Zhicai Shi
Yujie Xiong
机构
[1] Shanghai University of Engineering Science,School of Electronic and Electrical Engineering
[2] China Telecom Corporation Limited Shanghai Branch,undefined
[3] Shanghai Key Laboratory of Integrated Administration Technologies for Information Security,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Natural language processing; Computer vision; Cross-modality; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Vision-language-navigation(VLN) is a challenging task that requires a robot to autonomously move to a destination based on visual observation following a human’s natural language instructions. To improve the performance and generalization ability, the pre-training model based on the transformer is used instead of the traditional methods. However, the pre-training model is not suitable for sustainable computing and practical application because of its complex computations and large amount of hardware occupation. Therefore, we propose a slight pre-training model through knowledge distillation. Through knowledge distillation, the plenty of knowledge encoded in a large “teacher” model can be well transferred to a small “student” model, which greatly reduces the model parameters and inference time while maintaining the original performance. In the experiments, the model size is reduced by 87%, and the average inference time is reduced by approximately 86%. It can be trained and run much faster. At the same time, 95% performance of the original model was maintained, which is still better than the traditional VLN models.
引用
收藏
页码:5607 / 5619
页数:12
相关论文
共 50 条
  • [1] Knowledge distilled pre-training model for vision-language-navigation
    Huang, Bo
    Zhang, Shuai
    Huang, Jitao
    Yu, Yijun
    Shi, Zhicai
    Xiong, Yujie
    [J]. APPLIED INTELLIGENCE, 2023, 53 (05) : 5607 - 5619
  • [2] Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task
    HUANG Jitao
    ZENG Guohui
    HUANG Bo
    GAO Yongbin
    LIU Jin
    SHI Zhicai
    [J]. Wuhan University Journal of Natural Sciences, 2021, 26 (02) : 147 - 155
  • [3] Simultaneously Training and Compressing Vision-and-Language Pre-Training Model
    Qi, Qiaosong
    Zhang, Aixi
    Liao, Yue
    Sun, Wenyu
    Wang, Yongliang
    Li, Xiaobo
    Liu, Si
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8194 - 8203
  • [4] Retrieval-based Knowledge Augmented Vision Language Pre-training
    Rao, Jiahua
    Shan, Zifei
    Liu, Longpo
    Zhou, Yao
    Yang, Yuedong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5399 - 5409
  • [5] Pre-training A Prompt Pool for Vision-Language Model
    Liu, Jun
    Gu, Yang
    Yang, Zhaohua
    Guo, Shuai
    Liu, Huaqiu
    Chen, Yiqiang
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Survey on Vision-language Pre-training
    Yin, Jiong
    Zhang, Zhe-Dong
    Gao, Yu-Han
    Yang, Zhi-Wen
    Li, Liang
    Xiao, Mang
    Sun, Yao-Qi
    Yan, Cheng-Gang
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
  • [7] RELATION ENHANCED VISION LANGUAGE PRE-TRAINING
    Lee, Ju-Hee
    Kang, Je-Won
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2286 - 2290
  • [8] Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation
    Cui, Yibo
    Xie, Liang
    Zhang, Yakun
    Zhang, Meishan
    Yan, Ye
    Yin, Erwei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12009 - 12019
  • [9] Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4233 - 4241
  • [10] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
    Chen, Xiaofei
    He, Yuting
    Xue, Cheng
    Ge, Rongjun
    Li, Shuo
    Yang, Guanyu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415