Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model

被引：3

作者：

Xu, Qiang ^{[1
]}

Song, Tongtong ^{[1
]}

Wang, Longbiao ^{[1
]}

Shi, Hao ^{[2
]}

Lin, Ynqin ^{[1
]}

Lv, Yongjie ^{[1
]}

Ge, Meng ^{[1
]}

Yu, Qiang ^{[1
]}

Dang, Jianwu ^{[1
,3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan

[3] Japan Adv Inst Sci & Technol, Kanazawa, Ishikawa, Japan

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

automatic speech recognition; self-distillation; teacher-student model; model compression; KNOWLEDGE DISTILLATION; ATTENTION;

D O I：

10.21437/Interspeech.2022-11423

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Model compression of ASR aims to reduce the model parameters while bringing as little performance degradation as possible. Knowledge Distillation (KD) is an efficient model compression method that transfers the knowledge from a large teacher model to a smaller student model. However, most of the existing KD methods study how to fully utilize the teacher's knowledge without paying attention to the student's own knowledge. In this paper, we explore whether the high-level information of the model itself is helpful for low-level information. We first propose neighboring feature self-distillation (NFSD) approach to distill the knowledge from the adjacent deeper layer to the shallow one, which shows significant performance improvement. Therefore, we further propose attention-based feature self-distillation (AFSD) approach to exploit more high-level information. Specifically, AFSD fuses the knowledge from multiple deep layers with an attention mechanism and distills it to a shallow one. The experimental results on AISHELL-1 dataset show that 7.3% and 8.3% relative character error rate (CER) reduction can be achieved from NFSD and AFSD, respectively. In addition, our proposed two approaches can be easily combined with the general teacher-student knowledge distillation method to achieve 12.4% and 13.4% relative CER reduction compared with the baseline student model, respectively.

引用

页码：1716 / 1720

页数：5

共 47 条

[1] Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
Jeoung, Ye-Rin
Choi, Jeong-Hwan
Seong, Ju-Seok
Kyung, JeHyun
Chang, Joon-Hyuk
INTERSPEECH 2023, 2023, : 3197 - 3201
[2] ESTMST-ST: An End-to-End Soft Threshold and Multiloss Self-Distillation Based Swin Transformer for Underwater Acoustic Signal Recognition
Wu, Fan
Yao, Haiyang
Zhao, Zhongda
Zhao, Xiaobo
Zang, Yuzhang
Wang, Haiyan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[3] An End-to-End Workflow for Engineering of Biological Networks from High-Level Specifications
Beal, Jacob
Weiss, Ron
Densmore, Douglas
Adler, Aaron
Appleton, Evan
Babb, Jonathan
Bhatia, Swapnil
Davidsohn, Noah
Haddock, Traci
Loyall, Joseph
Schantz, Richard
Vasilev, Viktor
Yaman, Fusun
ACS SYNTHETIC BIOLOGY, 2012, 1 (08): : 317 - 331
[4] End-to-end relation extraction based on bootstrapped multi-level distant supervision
He, Ying
Li, Zhixu
Yang, Qiang
Chen, Zhigang
Liu, An
Zhao, Lei
Zhou, Xiaofang
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (05): : 2933 - 2956
[5] End-to-end relation extraction based on bootstrapped multi-level distant supervision
Ying He
Zhixu Li
Qiang Yang
Zhigang Chen
An Liu
Lei Zhao
Xiaofang Zhou
World Wide Web, 2020, 23 : 2933 - 2956
[6] DEEQ: Data-driven End-to-End EQuivalence Checking of High-level Synthesis
Abderehman, Mohammed
Reddy, Theegala Rakesh
Karfa, Chandan
PROCEEDINGS OF THE TWENTY THIRD INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2022), 2022, : 64 - 70
[7] Hierarchical transformer-based large-context end-to-end ASR with large-context knowledge distillation
Masumura, Ryo
Makishima, Naoki
Ihori, Mana
Takashima, Akihiko
Tanaka, Tomohiro
Orihashi, Shota
arXiv, 2021,
[8] HIERARCHICAL TRANSFORMER-BASED LARGE-CONTEXT END-TO-END ASR WITH LARGE-CONTEXT KNOWLEDGE DISTILLATION
Masumura, Ryo
Makishima, Naoki
Ihori, Mana
Takashima, Akihiko
Tanaka, Tomohiro
Orihashi, Shota
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5879 - 5883
[9] Development of a high-level design of an analytical software complex for an enterprise that provides end-to-end planning
Seredenko, Natalya N.
BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2024, 18 (04): : 61 - 80
[10] System-Level Design Space Exploration for High-Level Synthesis Under End-to-End Latency Constraints
Liao, Yuchao
Adegbija, Tosiron
Lysecky, Roman
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (04) : 1354 - 1365

← 1 2 3 4 5 →