Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model

被引:3
|
作者
Xu, Qiang [1 ]
Song, Tongtong [1 ]
Wang, Longbiao [1 ]
Shi, Hao [2 ]
Lin, Ynqin [1 ]
Lv, Yongjie [1 ]
Ge, Meng [1 ]
Yu, Qiang [1 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Kanazawa, Ishikawa, Japan
来源
基金
中国国家自然科学基金;
关键词
automatic speech recognition; self-distillation; teacher-student model; model compression; KNOWLEDGE DISTILLATION; ATTENTION;
D O I
10.21437/Interspeech.2022-11423
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Model compression of ASR aims to reduce the model parameters while bringing as little performance degradation as possible. Knowledge Distillation (KD) is an efficient model compression method that transfers the knowledge from a large teacher model to a smaller student model. However, most of the existing KD methods study how to fully utilize the teacher's knowledge without paying attention to the student's own knowledge. In this paper, we explore whether the high-level information of the model itself is helpful for low-level information. We first propose neighboring feature self-distillation (NFSD) approach to distill the knowledge from the adjacent deeper layer to the shallow one, which shows significant performance improvement. Therefore, we further propose attention-based feature self-distillation (AFSD) approach to exploit more high-level information. Specifically, AFSD fuses the knowledge from multiple deep layers with an attention mechanism and distills it to a shallow one. The experimental results on AISHELL-1 dataset show that 7.3% and 8.3% relative character error rate (CER) reduction can be achieved from NFSD and AFSD, respectively. In addition, our proposed two approaches can be easily combined with the general teacher-student knowledge distillation method to achieve 12.4% and 13.4% relative CER reduction compared with the baseline student model, respectively.
引用
收藏
页码:1716 / 1720
页数:5
相关论文
共 47 条
  • [31] CITIROC high-level analog front-end model implementation and simulations
    1600, North Atlantic University Union NAUN (08):
  • [32] COMPRESSING TRANSFORMER-BASED ASR MODEL BY TASK-DRIVEN LOSS AND ATTENTION-BASED MULTI-LEVEL FEATURE DISTILLATION
    Lv, Yongjie
    Wang, Longbiao
    Ge, Meng
    Li, Sheng
    Ding, Chenchen
    Pan, Lixin
    Wang, Yuguang
    Dang, Jianwu
    Honda, Kiyoshi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7992 - 7996
  • [33] COMPRESSING TRANSFORMER-BASED ASR MODEL BY TASK-DRIVEN LOSS AND ATTENTION-BASED MULTI-LEVEL FEATURE DISTILLATION
    Lv, Yongjie
    Wang, Longbiao
    Ge, Meng
    Li, Sheng
    Ding, Chenchen
    Pan, Lixin
    Wang, Yuguang
    Dang, Jianwu
    Honda, Kiyoshi
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 7992 - 7996
  • [34] Longitudinal and Lateral Coupling Model Based End-to-End Learning for Lane Keeping of Self-driving Cars
    Yuan, Wei
    Yang, Ming
    Wang, Chunxiang
    Wang, Bing
    COGNITIVE SYSTEMS AND SIGNAL PROCESSING, PT II, 2019, 1006 : 425 - 436
  • [35] FC-DETR: High-precision end-to-end surface defect detector based on foreground supervision and cascade refined hybrid matching
    Xia, Zilin
    Zhao, Yufan
    Gu, Jinan
    Wang, Wenbo
    Zhang, Wenhao
    Huang, Zedong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [36] EDC-DTI: An end-to-end deep collaborative learning model based on multiple information for drug-target interactions prediction
    Yuan, Yongna
    Zhang, Yuhao
    Meng, Xiangbo
    Liu, Zhenyu
    Wang, Bohan
    Miao, Ruidong
    Zhang, Ruisheng
    Su, Wei
    Liu, Lei
    JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2023, 122
  • [37] End-to-End Self-organizing Intelligent Security Model for Wireless Sensor Network based on a Hybrid (AES-RSA) Cryptography
    Devi, V. Anusuya
    Sampradeepraj, T.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (03) : 1675 - 1703
  • [38] Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning
    Li, Fei
    Liu, Weisong
    Yu, Hong
    JMIR MEDICAL INFORMATICS, 2018, 6 (04) : 32 - 45
  • [39] Multiyear Mapping of Water Demand at Crop Level: An End-to-End Workflow Based on High-Resolution Crop Type Maps and Meteorological Data
    Weikmann, Giulio
    Marinelli, Daniele
    Paris, Claudia
    Migdall, Silke
    Gleisberg, Eva
    Appel, Florian
    Bach, Heike
    Dowling, Jim
    Bruzzone, Lorenzo
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 6758 - 6775
  • [40] HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN-Transformer Features
    Chen, Xiyin
    Zhang, Xiaohu
    Shi, Yonghua
    Pang, Junjie
    SENSORS, 2025, 25 (05)