END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS

被引:0
|
作者
Hori, Takaaki [1 ]
Cho, Jaejin [2 ]
Watanabe, Shinji [2 ]
机构
[1] MERL, Cambridge, MA 02139 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
End-to-end speech recognition; language modeling; decoding; connectionist temporal classification; attention decoder;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR). In our prior work, we have proposed a multi-level LM, in which character-based and word-based RNN-LMs are combined in hybrid CTC/attention-based ASR. Although this multi-level approach achieves significant error reduction in the Wall Street Journal (WSJ) task, two different LMs need to be trained and used for decoding, which increase the computational cost and memory usage. In this paper, we further propose a novel word-based RNN-LM, which allows us to decode with only the word-based LM, where it provides look-ahead word probabilities to predict next characters instead of the character-based LM, leading competitive accuracy with less computation compared to the multi-level LM. We demonstrate the efficacy of the word-based RNN-LMs using a larger corpus, LibriSpeech, in addition to WSJ we used in the prior work. Furthermore, we show that the proposed model achieves 5.1 % WER for WSJ Eval'92 test set when the vocabulary size is increased, which is the best WER reported for end-to-end ASR systems on this benchmark.
引用
下载
收藏
页码:389 / 396
页数:8
相关论文
共 50 条
  • [1] Location-Based End-to-End Speech Recognition with Multiple Language Models
    Lin, Zhijie
    Lin, Kaiyang
    Chen, Shiling
    Li, Linlin
    Zhao, Zhou
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9975 - 9976
  • [2] End-to-end speech recognition with Alignment RNN-Transducer
    Tian, Ying
    Li, Zerui
    Liu, Min
    Ouchi, Kazushige
    Yan, Long
    Zhao, Dan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [3] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
    Li, Jinyu
    Zhao, Rui
    Hu, Hu
    Gong, Yifan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121
  • [4] EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
    Miao, Yajie
    Gowayyed, Mohammad
    Metze, Florian
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 167 - 174
  • [5] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [6] Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
    Guo, Jinxi
    Tiwari, Gautam
    Droppo, Jasha
    Van Segbroeck, Maarten
    Huang, Che-Wei
    Stolcke, Andreas
    Maas, Roland
    INTERSPEECH 2020, 2020, : 2807 - 2811
  • [7] Spelling-Aware Word-Based End-to-End ASR
    Egorova, Ekaterina
    Vydana, Hari Krishna
    Burget, Lukas
    Cernocky, Jan Honza
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1729 - 1733
  • [8] Sub-word Based End-to-End Speech Recognition for an Under-Resourced Language: Amharic
    Gebreegziabher, Nirayo Hailu
    Nuernberger, Andreas
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3466 - 3470
  • [9] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [10] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3899 - 3903