Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

被引:1
|
作者
Gong, Xun [1 ]
Zhou, Zhikai [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, AI Inst, X LANCE Lab,Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
关键词
knowledge transfer; knowledge distillation; nonautoregressive; end-to-end; speech recognition;
D O I
10.21437/Interspeech.2022-632
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern non-autoregressive (NAR) speech recognition systems aim to accelerate the inference speed; however, they suffer from performance degradation compared with autoregressive (AR) models as well as the huge model size issue. We propose a novel knowledge transfer and distillation architecture that leverages knowledge from AR models to improve the NAR performance while reducing the model's size. Frame- and sequence-level objectives are well-designed for transfer learning. To further boost the performance of NAR, a beam search method on Mask-CTC is developed to enlarge the search space during the inference stage. Experiments show that the proposed NAR beam search relatively reduces CER by over 5% on AISHELL-1 benchmark with a tolerable real-time-factor (RTF) increment. By knowledge transfer, the NAR student who has the same size as the AR teacher obtains relative CER reductions of 8/16% on AISHELL-1 dev/test sets, and over 25% relative WER reductions on LibriSpeech test-clean/other sets. Moreover, the similar to 9x smaller NAR models achieve similar to 25% relative CER/WER reductions on both AISHELL-1 and LibriSpeech benchmarks with the proposed knowledge transfer and distillation.
引用
收藏
页码:2618 / 2622
页数:5
相关论文
共 50 条
  • [1] Non-Autoregressive Transformer for Speech Recognition
    Chen, Nanxin
    Watanabe, Shinji
    Villalba, Jesus
    Zelasko, Piotr
    Dehak, Najim
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 121 - 125
  • [2] Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 762 - 766
  • [3] Pushing the Limits of Non-Autoregressive Speech Recognition
    Ng, Edwin G.
    Chiu, Chung-Cheng
    Zhang, Yu
    Chan, William
    [J]. INTERSPEECH 2021, 2021, : 3725 - 3729
  • [4] Non-Autoregressive Speech Recognition with Error Correction Module
    Qian, Yukun
    Zhuang, Xuyi
    Zhang, Zehua
    Zhou, Lianyu
    Lin, Xu
    Wang, Mingjiang
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1103 - 1108
  • [5] Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation
    Huang, Chenyang
    Huang, Fei
    Zheng, Zaixiang
    Zaiane, Osmar
    Zhou, Hao
    Mou, Lili
    [J]. 13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 161 - 170
  • [6] An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
    Fan, Ruchao
    Chu, Wei
    Chang, Peng
    Xiao, Jing
    Alwan, Abeer
    [J]. INTERSPEECH 2021, 2021, : 3715 - 3719
  • [7] NON-AUTOREGRESSIVE TRANSFORMER WITH UNIFIED BIDIRECTIONAL DECODER FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Chuan-Fei
    Liu, Yan
    Zhang, Tian-Hao
    Chen, Song-Lu
    Chen, Feng
    Yin, Xu-Cheng
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6527 - 6531
  • [8] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Chuang, Shun-Po
    Chang, Heng-Jui
    Huang, Sung-Feng
    Lee, Hung-yi
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472
  • [9] Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
    Chi, Ethan A.
    Salazar, Julian
    Kirchhoff, Katrin
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1920 - 1927
  • [10] Improving Autoregressive NMT with Non-Autoregressive Model
    Zhou, Long
    Zhang, Jiajun
    Zong, Chengqing
    [J]. WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 24 - 29