DISTILLING KNOWLEDGE FROM ENSEMBLES OF ACOUSTIC MODELS FOR JOINT CTC-ATTENTION END-TO-END SPEECH RECOGNITION

被引:3
|
作者
Gao, Yan [1 ]
Parcollet, Titouan [1 ,2 ]
Lane, Nicholas D. [1 ,3 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Avignon Univ, Avignon, France
[3] Samsung AI, Cambridge, England
关键词
End-to-end speech recognition; attention models; CTC; multi-teacher knowledge distillation; DISTILLATION; LIBRISPEECH;
D O I
10.1109/ASRU51503.2021.9688302
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation has been widely used to compress existing deep learning models while preserving the performance on a wide range of applications. In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acoustic models has recently shown promising results in increasing recognition performance. In this paper, we propose an extension of multi-teacher distillation methods to joint CTC-attention end-to-end ASR systems. We also introduce three novel distillation strategies. The core intuition behind them is to integrate the error rate metric to the teacher selection rather than solely focusing on the observed losses. In this way, we directly distill and optimize the student toward the relevant metric for speech recognition. We evaluate these strategies under a selection of training procedures on different datasets (TIMIT, Librispeech, Common Voice) and various languages (English, French, Italian). In particular, state-ofthe-art error rates are reported on the Common Voice French, Italian and TIMIT datasets.
引用
收藏
页码:138 / 145
页数:8
相关论文
共 50 条
  • [1] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 936 - 943
  • [2] Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    [J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 337 - 347
  • [3] Joint CTC-Attention End-to-End Speech Recognition with a Triangle Recurrent Neural Network Encoder
    Zhu T.
    Cheng C.
    [J]. Journal of Shanghai Jiaotong University (Science), 2020, 25 (01) : 70 - 75
  • [4] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
    Kim, Suyoun
    Hori, Takaaki
    Watanabe, Shinji
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
  • [5] Joint CTC/attention decoding for end-to-end speech recognition
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529
  • [6] Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units
    Xiao, Zhangyu
    Ou, Zhijian
    Chu, Wei
    Lin, Hui
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 146 - 150
  • [7] Improved CTC-Attention Based End-to-End Speech Recognition on Air Traffic Control
    Zhou, Kai
    Yang, Qun
    Sun, XiuSong
    Liu, ShaoHan
    Lu, JinJun
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 187 - 196
  • [8] Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
    Hari, Takaaki
    Watanabe, Shinji
    Zhang, Yu
    Chan, William
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 949 - 953
  • [9] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
  • [10] Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language
    Park, Hosung
    Kim, Changmin
    Son, Hyunsoo
    Seo, Soonshin
    Kim, Ji-Hwan
    [J]. JOURNAL OF WEB ENGINEERING, 2022, 21 (02): : 265 - 284