Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model

被引:3
|
作者
Xu, Qiang [1 ]
Song, Tongtong [1 ]
Wang, Longbiao [1 ]
Shi, Hao [2 ]
Lin, Ynqin [1 ]
Lv, Yongjie [1 ]
Ge, Meng [1 ]
Yu, Qiang [1 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Kanazawa, Ishikawa, Japan
来源
基金
中国国家自然科学基金;
关键词
automatic speech recognition; self-distillation; teacher-student model; model compression; KNOWLEDGE DISTILLATION; ATTENTION;
D O I
10.21437/Interspeech.2022-11423
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Model compression of ASR aims to reduce the model parameters while bringing as little performance degradation as possible. Knowledge Distillation (KD) is an efficient model compression method that transfers the knowledge from a large teacher model to a smaller student model. However, most of the existing KD methods study how to fully utilize the teacher's knowledge without paying attention to the student's own knowledge. In this paper, we explore whether the high-level information of the model itself is helpful for low-level information. We first propose neighboring feature self-distillation (NFSD) approach to distill the knowledge from the adjacent deeper layer to the shallow one, which shows significant performance improvement. Therefore, we further propose attention-based feature self-distillation (AFSD) approach to exploit more high-level information. Specifically, AFSD fuses the knowledge from multiple deep layers with an attention mechanism and distills it to a shallow one. The experimental results on AISHELL-1 dataset show that 7.3% and 8.3% relative character error rate (CER) reduction can be achieved from NFSD and AFSD, respectively. In addition, our proposed two approaches can be easily combined with the general teacher-student knowledge distillation method to achieve 12.4% and 13.4% relative CER reduction compared with the baseline student model, respectively.
引用
收藏
页码:1716 / 1720
页数:5
相关论文
共 47 条
  • [1] Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
    Jeoung, Ye-Rin
    Choi, Jeong-Hwan
    Seong, Ju-Seok
    Kyung, JeHyun
    Chang, Joon-Hyuk
    INTERSPEECH 2023, 2023, : 3197 - 3201
  • [2] ESTMST-ST: An End-to-End Soft Threshold and Multiloss Self-Distillation Based Swin Transformer for Underwater Acoustic Signal Recognition
    Wu, Fan
    Yao, Haiyang
    Zhao, Zhongda
    Zhao, Xiaobo
    Zang, Yuzhang
    Wang, Haiyan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [3] An End-to-End Workflow for Engineering of Biological Networks from High-Level Specifications
    Beal, Jacob
    Weiss, Ron
    Densmore, Douglas
    Adler, Aaron
    Appleton, Evan
    Babb, Jonathan
    Bhatia, Swapnil
    Davidsohn, Noah
    Haddock, Traci
    Loyall, Joseph
    Schantz, Richard
    Vasilev, Viktor
    Yaman, Fusun
    ACS SYNTHETIC BIOLOGY, 2012, 1 (08): : 317 - 331
  • [4] End-to-end relation extraction based on bootstrapped multi-level distant supervision
    He, Ying
    Li, Zhixu
    Yang, Qiang
    Chen, Zhigang
    Liu, An
    Zhao, Lei
    Zhou, Xiaofang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (05): : 2933 - 2956
  • [5] End-to-end relation extraction based on bootstrapped multi-level distant supervision
    Ying He
    Zhixu Li
    Qiang Yang
    Zhigang Chen
    An Liu
    Lei Zhao
    Xiaofang Zhou
    World Wide Web, 2020, 23 : 2933 - 2956
  • [6] DEEQ: Data-driven End-to-End EQuivalence Checking of High-level Synthesis
    Abderehman, Mohammed
    Reddy, Theegala Rakesh
    Karfa, Chandan
    PROCEEDINGS OF THE TWENTY THIRD INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2022), 2022, : 64 - 70
  • [7] Hierarchical transformer-based large-context end-to-end ASR with large-context knowledge distillation
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    arXiv, 2021,
  • [8] HIERARCHICAL TRANSFORMER-BASED LARGE-CONTEXT END-TO-END ASR WITH LARGE-CONTEXT KNOWLEDGE DISTILLATION
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5879 - 5883
  • [9] Development of a high-level design of an analytical software complex for an enterprise that provides end-to-end planning
    Seredenko, Natalya N.
    BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2024, 18 (04): : 61 - 80
  • [10] System-Level Design Space Exploration for High-Level Synthesis Under End-to-End Latency Constraints
    Liao, Yuchao
    Adegbija, Tosiron
    Lysecky, Roman
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (04) : 1354 - 1365