Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model

被引:3
|
作者
Xu, Qiang [1 ]
Song, Tongtong [1 ]
Wang, Longbiao [1 ]
Shi, Hao [2 ]
Lin, Ynqin [1 ]
Lv, Yongjie [1 ]
Ge, Meng [1 ]
Yu, Qiang [1 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Kanazawa, Ishikawa, Japan
来源
基金
中国国家自然科学基金;
关键词
automatic speech recognition; self-distillation; teacher-student model; model compression; KNOWLEDGE DISTILLATION; ATTENTION;
D O I
10.21437/Interspeech.2022-11423
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Model compression of ASR aims to reduce the model parameters while bringing as little performance degradation as possible. Knowledge Distillation (KD) is an efficient model compression method that transfers the knowledge from a large teacher model to a smaller student model. However, most of the existing KD methods study how to fully utilize the teacher's knowledge without paying attention to the student's own knowledge. In this paper, we explore whether the high-level information of the model itself is helpful for low-level information. We first propose neighboring feature self-distillation (NFSD) approach to distill the knowledge from the adjacent deeper layer to the shallow one, which shows significant performance improvement. Therefore, we further propose attention-based feature self-distillation (AFSD) approach to exploit more high-level information. Specifically, AFSD fuses the knowledge from multiple deep layers with an attention mechanism and distills it to a shallow one. The experimental results on AISHELL-1 dataset show that 7.3% and 8.3% relative character error rate (CER) reduction can be achieved from NFSD and AFSD, respectively. In addition, our proposed two approaches can be easily combined with the general teacher-student knowledge distillation method to achieve 12.4% and 13.4% relative CER reduction compared with the baseline student model, respectively.
引用
收藏
页码:1716 / 1720
页数:5
相关论文
共 47 条
  • [41] Lentic chironomid performance in species-based bioassessment proving: High-level taxonomy is not a dead end in monitoring
    Doric, Valentina
    Pozojevic, Ivana
    Vuckovic, Natalija
    Ivkovic, Marija
    Mihaljevic, Zlatko
    ECOLOGICAL INDICATORS, 2021, 121 (121)
  • [42] A High-level Petri Net-based Formal Model of Distributed Self-adaptive Systems
    Camilli, Matteo
    Bellettini, Carlo
    Capra, Lorenzo
    ECSA 2018: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON SOFTWARE ARCHITECTURE: COMPANION PROCEEDINGS, 2018,
  • [43] Condition recognition model based on multi-source information fusion for high-end CNC equipment
    Wang H.
    Gu Y.
    Wang M.
    Zhao C.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2018, 39 (04): : 61 - 66
  • [44] Development of a 3D-Printed Inhomogeneous Phantom for End-to-End Testing of Model-Based Dose Calculation Algorithms (MBDCAs) in High-Dose-Rate (HDR) Brachytherapy
    Grace, S.
    Huang, L.
    Wang, L.
    Cao, Y.
    MEDICAL PHYSICS, 2024, 51 (10) : 7780 - 7780
  • [45] Observation-based trajectory of future sea level for the coastal United States tracks near high-end model projections
    Benjamin D. Hamlington
    Don P. Chambers
    Thomas Frederikse
    Soenke Dangendorf
    Severine Fournier
    Brett Buzzanga
    R. Steven Nerem
    Communications Earth & Environment, 3
  • [46] Observation-based trajectory of future sea level for the coastal United States tracks near high-end model projections
    Hamlington, Benjamin D.
    Chambers, Don P.
    Frederikse, Thomas
    Dangendorf, Soenke
    Fournier, Severine
    Buzzanga, Brett
    Nerem, R. Steven
    COMMUNICATIONS EARTH & ENVIRONMENT, 2022, 3 (01):
  • [47] Scale-model test for disposal pit of high-level radioactive waste and theoretical evaluation of self-sealing of bentonite-based buffers
    Komine, Hideo
    CANADIAN GEOTECHNICAL JOURNAL, 2020, 57 (04) : 608 - 615