Knowledge Distillation Approach for Efficient Internal Language Model Estimation

被引:0
|
作者
Chen, Zhipeng [1 ]
Xu, Haihua [1 ]
Khassanov, Yerbolat [1 ]
He, Yi [1 ]
Lu, Lu [1 ]
Ma, Zejun [1 ]
Wu, Ji [2 ]
机构
[1] ByteDance, Beijing, Peoples R China
[2] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
关键词
ASR; language model; ILME; density ratio; knowledge distillation; efficiency; ASR;
D O I
10.21437/Interspeech.2023-2479
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Internal language model estimation (ILME) has demonstrated its efficacy in domain adaptation for end-to-end (E2E) ASR. However, the performance improvement is achieved at the expense of computational cost, compared with conventional shallow fusion. To estimate the internal language model prior, one should run an extra forward operation on either ASR decoder or a separate density ratio (DR) language model (LM) for each decoding utterance. In this paper, we propose to employ knowledge distillation (KD) approach to realize efficient ILME for the Listen-Attend-Spell (LAS) E2E ASR model. First, we extensively explore diverse ILME and DR methods. We find that the ILM can be approximated with a DR-LM much smaller than the original ASR decoder. Furthermore, to reach the performance of ILME, we propose to employ the estimated ILM as teacher to teach a small DR-LM by KD. In this way, we achieve the best of both worlds: comparable performance to ILME and high efficiency of DR with a small DR-LM.
引用
收藏
页码:1339 / 1343
页数:5
相关论文
共 50 条
  • [1] Online Knowledge Distillation for Efficient Pose Estimation
    Li, Zheng
    Ye, Jingwen
    Song, Mingli
    Huang, Ying
    Pan, Zhigeng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11720 - 11730
  • [2] Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights
    Ballout, Mohamad
    Krumnack, Ulf
    Heidemann, Gunther
    Kuehnberger, Kai-Uwe
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 32 - 46
  • [3] Efficient Knowledge Distillation from Model Checkpoints
    Wang, Chaofei
    Yang, Qisen
    Huang, Rui
    Song, Shiji
    Huang, Gao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8392 - 8396
  • [5] Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency
    Li, Jia
    Fu, Kui
    Zhao, Shengwei
    Ge, Shiming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 1902 - 1914
  • [6] Parameter-efficient online knowledge distillation for pretrained language models
    Wang, Yukun
    Wang, Jin
    Zhang, Xuejie
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [7] A Task-Efficient Gradient Guide Knowledge Distillation for Pre-train Language Model Compression
    Liu, Xu
    Su, Yila
    Wu, Nier
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 366 - 377
  • [8] Uncertainty-Driven Knowledge Distillation for Language Model Compression
    Huang, Tianyu
    Dong, Weisheng
    Wu, Fangfang
    Li, Xin
    Shi, Guangming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2850 - 2858
  • [9] An Efficient and Lightweight Approach for Intrusion Detection based on Knowledge Distillation
    Zhao, Ruijie
    Chen, Yu
    Wang, Yijun
    Shi, Yong
    Xue, Zhi
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [10] Lifelong Language Knowledge Distillation
    Chuang, Yung-Sung
    Su, Shang-Yu
    Chen, Yun-Nung
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2914 - 2924