Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

被引:1
|
作者
Gong, Xun [1 ]
Zhou, Zhikai [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, AI Inst, X LANCE Lab,Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
关键词
knowledge transfer; knowledge distillation; nonautoregressive; end-to-end; speech recognition;
D O I
10.21437/Interspeech.2022-632
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern non-autoregressive (NAR) speech recognition systems aim to accelerate the inference speed; however, they suffer from performance degradation compared with autoregressive (AR) models as well as the huge model size issue. We propose a novel knowledge transfer and distillation architecture that leverages knowledge from AR models to improve the NAR performance while reducing the model's size. Frame- and sequence-level objectives are well-designed for transfer learning. To further boost the performance of NAR, a beam search method on Mask-CTC is developed to enlarge the search space during the inference stage. Experiments show that the proposed NAR beam search relatively reduces CER by over 5% on AISHELL-1 benchmark with a tolerable real-time-factor (RTF) increment. By knowledge transfer, the NAR student who has the same size as the AR teacher obtains relative CER reductions of 8/16% on AISHELL-1 dev/test sets, and over 25% relative WER reductions on LibriSpeech test-clean/other sets. Moreover, the similar to 9x smaller NAR models achieve similar to 25% relative CER/WER reductions on both AISHELL-1 and LibriSpeech benchmarks with the proposed knowledge transfer and distillation.
引用
收藏
页码:2618 / 2622
页数:5
相关论文
共 50 条
  • [41] NAOMI: Non-Autoregressive Multiresolution Sequence Imputation
    Liu, Yukai
    Yu, Rose
    Zheng, Stephan
    Zhan, Eric
    Yue, Yisong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Enhanced encoder for non-autoregressive machine translation
    Wang, Shuheng
    Shi, Shumin
    Huang, Heyan
    MACHINE TRANSLATION, 2021, 35 (04) : 595 - 609
  • [43] ALIGNMENT-LEARNING BASED SINGLE-STEP DECODING FOR ACCURATE AND FAST NON-AUTOREGRESSIVE SPEECH RECOGNITION
    Wang, Yonghe
    Liu, Rui
    Bao, Feilong
    Zhang, Hui
    Gao, Guanglai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8292 - 8296
  • [44] A Study of Non-autoregressive Model for Sequence Generation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhao, Zhou
    Zhao, Sheng
    Liu, Tie-Yan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 149 - 159
  • [45] IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Yang, Zehui
    Watanabe, Shinji
    Higuchi, Yosuke
    Cheng, Gaofeng
    Zhang, Pengyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8522 - 8526
  • [46] CASS-NAT: CTC ALIGNMENT-BASED SINGLE STEP NON-AUTOREGRESSIVE TRANSFORMER FOR SPEECH RECOGNITION
    Fan, Ruchao
    Chu, Wei
    Chang, Peng
    Xiao, Jing
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5889 - 5893
  • [47] Acyclic Transformer for Non-Autoregressive Machine Translation
    Huang, Fei
    Zhou, Hao
    Liu, Yang
    Li, Hang
    Huang, Minlie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [48] Non-Autoregressive Machine Translation with Auxiliary Regularization
    Wang, Yiren
    Tian, Fei
    He, Di
    Qin, Tao
    Zhai, ChengXiang
    Liu, Tie-Yan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5377 - 5384
  • [49] A Survey of Non-Autoregressive Neural Machine Translation
    Li, Feng
    Chen, Jingxian
    Zhang, Xuejun
    ELECTRONICS, 2023, 12 (13)
  • [50] Non-Autoregressive Machine Translation with Latent Alignments
    Saharia, Chitwan
    Chan, William
    Saxena, Saurabh
    Norouzi, Mohammad
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1098 - 1108