END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS

被引:0
|
作者
Hori, Takaaki [1 ]
Cho, Jaejin [2 ]
Watanabe, Shinji [2 ]
机构
[1] MERL, Cambridge, MA 02139 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
End-to-end speech recognition; language modeling; decoding; connectionist temporal classification; attention decoder;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR). In our prior work, we have proposed a multi-level LM, in which character-based and word-based RNN-LMs are combined in hybrid CTC/attention-based ASR. Although this multi-level approach achieves significant error reduction in the Wall Street Journal (WSJ) task, two different LMs need to be trained and used for decoding, which increase the computational cost and memory usage. In this paper, we further propose a novel word-based RNN-LM, which allows us to decode with only the word-based LM, where it provides look-ahead word probabilities to predict next characters instead of the character-based LM, leading competitive accuracy with less computation compared to the multi-level LM. We demonstrate the efficacy of the word-based RNN-LMs using a larger corpus, LibriSpeech, in addition to WSJ we used in the prior work. Furthermore, we show that the proposed model achieves 5.1 % WER for WSJ Eval'92 test set when the vocabulary size is increased, which is the best WER reported for end-to-end ASR systems on this benchmark.
引用
收藏
页码:389 / 396
页数:8
相关论文
共 50 条
  • [31] END-TO-END SPEECH RECOGNITION FROM FEDERATED ACOUSTIC MODELS
    Gao, Yan
    Parcollet, Titouan
    Zaiem, Salah
    Fernandez-Marques, Javier
    de Gusmao, Pedro P. B.
    Beutel, Daniel J.
    Lane, Nicholas D.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7227 - 7231
  • [32] LANGUAGE INDEPENDENT END-TO-END ARCHITECTURE FOR JOINT LANGUAGE IDENTIFICATION AND SPEECH RECOGNITION
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 265 - 271
  • [33] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2668 - 2672
  • [34] END-TO-END CONTEXTUAL SPEECH RECOGNITION USING CLASS LANGUAGE MODELS AND A TOKEN PASSING DECODER
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6186 - 6190
  • [35] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
  • [36] A study of transformer-based end-to-end speech recognition system for Kazakh language
    Mamyrbayev, Orken
    Oralbekova, Dina
    Alimhan, Keylan
    Turdalykyzy, Tolganay
    Othman, Mohamed
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [37] A study of transformer-based end-to-end speech recognition system for Kazakh language
    Mamyrbayev Orken
    Oralbekova Dina
    Alimhan Keylan
    Turdalykyzy Tolganay
    Othman Mohamed
    Scientific Reports, 12
  • [38] End-to-end multilingual speech recognition system with language supervision training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    IEICE Transactions on Information and Systems, 2020, E103D (06) : 1427 - 1430
  • [39] Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
    Zhang, C.
    Li, B.
    Sainath, T. N.
    Strohman, T.
    Mavandadi, S.
    Chang, S.
    Haghani, P.
    INTERSPEECH 2022, 2022, : 3223 - 3227
  • [40] End-to-End Multilingual Speech Recognition System with Language Supervision Training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (06): : 1427 - 1430