Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model

被引:3
|
作者
Xu, Qiang [1 ]
Song, Tongtong [1 ]
Wang, Longbiao [1 ]
Shi, Hao [2 ]
Lin, Ynqin [1 ]
Lv, Yongjie [1 ]
Ge, Meng [1 ]
Yu, Qiang [1 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Kanazawa, Ishikawa, Japan
来源
基金
中国国家自然科学基金;
关键词
automatic speech recognition; self-distillation; teacher-student model; model compression; KNOWLEDGE DISTILLATION; ATTENTION;
D O I
10.21437/Interspeech.2022-11423
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Model compression of ASR aims to reduce the model parameters while bringing as little performance degradation as possible. Knowledge Distillation (KD) is an efficient model compression method that transfers the knowledge from a large teacher model to a smaller student model. However, most of the existing KD methods study how to fully utilize the teacher's knowledge without paying attention to the student's own knowledge. In this paper, we explore whether the high-level information of the model itself is helpful for low-level information. We first propose neighboring feature self-distillation (NFSD) approach to distill the knowledge from the adjacent deeper layer to the shallow one, which shows significant performance improvement. Therefore, we further propose attention-based feature self-distillation (AFSD) approach to exploit more high-level information. Specifically, AFSD fuses the knowledge from multiple deep layers with an attention mechanism and distills it to a shallow one. The experimental results on AISHELL-1 dataset show that 7.3% and 8.3% relative character error rate (CER) reduction can be achieved from NFSD and AFSD, respectively. In addition, our proposed two approaches can be easily combined with the general teacher-student knowledge distillation method to achieve 12.4% and 13.4% relative CER reduction compared with the baseline student model, respectively.
引用
收藏
页码:1716 / 1720
页数:5
相关论文
共 47 条
  • [21] An End-to-End Review-Based Aspect-Level Neural Model for Sequential Recommendation
    Liu, Yupeng
    Zhang, Yanan
    Zhang, Xiaochen
    DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2021, 2021
  • [22] Evaluation Model of Web Service Health Level on End-to-End Network Based on Artificial Immune
    Chen, Liping
    Ha, Weitao
    Zhang, Guojun
    2015 11TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2015, : 18 - 21
  • [23] A Novel End-to-End Corporate Credit Rating Model Based on Self-Attention Mechanism
    Chen, Binbin
    Long, Shengjie
    IEEE ACCESS, 2020, 8 (08): : 203876 - 203889
  • [24] THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision
    Aboukhadra, Ahmed Tawfik
    Malik, Jameel
    Elhayek, Ahmed
    Robertini, Nadia
    Stricker, Didier
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1001 - 1010
  • [25] Block-level dependency syntax based model for end-to-end aspect-based sentiment analysis
    Xiang, Yan
    Zhang, Jiqun
    Guo, Junjun
    NEURAL NETWORKS, 2023, 166 : 225 - 235
  • [26] A transformer-based model with feature compensation and local information enhancement for end-to-end pest detection
    Liu, Honglin
    Zhan, Yongzhao
    Sun, Jun
    Mao, Qirong
    Wu, Tongwang
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 231
  • [27] Model-Based System Engineering of the Orion Flight Test 1 End-to-End Information System
    McVittie, Thomas I.
    Sindiy, Oleg V.
    Simpson, Kimberly A.
    2012 IEEE AEROSPACE CONFERENCE, 2012,
  • [28] END-TO-END SPEECH RECOGNITION USING A HIGH RANK LSTM-CTC BASED MODEL
    Shi, Yangyang
    Hwang, Mei-Yuh
    Lei, Xin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7080 - 7084
  • [29] Extracting Key Information from Unlabeled Patents Based on Knowledge Self-Distillation of Large Language Model
    Jianfei, Zhao
    Ting, Chen
    Xiaomei, Wang
    Chong, Feng
    Data Analysis and Knowledge Discovery, 2024, 8 (8-9) : 133 - 143
  • [30] Pspice high-level model and simulations of the EASIROC analog front-end
    INAF, Osservatorio Astrofisico di Catania, Italy
    不详
    Int J Modell Simul, 4 (175-186):