END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS

被引：0

作者：

Hori, Takaaki ^{[1
]}

Cho, Jaejin ^{[2
]}

Watanabe, Shinji ^{[2
]}

机构：

[1] MERL, Cambridge, MA 02139 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018) | 2018年

关键词：

End-to-end speech recognition; language modeling; decoding; connectionist temporal classification; attention decoder;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the impact of word-based RNN language models (RNN-LMs) on the performance of end-to-end automatic speech recognition (ASR). In our prior work, we have proposed a multi-level LM, in which character-based and word-based RNN-LMs are combined in hybrid CTC/attention-based ASR. Although this multi-level approach achieves significant error reduction in the Wall Street Journal (WSJ) task, two different LMs need to be trained and used for decoding, which increase the computational cost and memory usage. In this paper, we further propose a novel word-based RNN-LM, which allows us to decode with only the word-based LM, where it provides look-ahead word probabilities to predict next characters instead of the character-based LM, leading competitive accuracy with less computation compared to the multi-level LM. We demonstrate the efficacy of the word-based RNN-LMs using a larger corpus, LibriSpeech, in addition to WSJ we used in the prior work. Furthermore, we show that the proposed model achieves 5.1 % WER for WSJ Eval'92 test set when the vocabulary size is increased, which is the best WER reported for end-to-end ASR systems on this benchmark.

引用

下载

页码：389 / 396

页数：8

共 50 条

[1] Location-Based End-to-End Speech Recognition with Multiple Language Models
Lin, Zhijie
Lin, Kaiyang
Chen, Shiling
Li, Linlin
Zhao, Zhou
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9975 - 9976
[2] End-to-end speech recognition with Alignment RNN-Transducer
Tian, Ying
Li, Zerui
Liu, Min
Ouchi, Kazushige
Yan, Long
Zhao, Dan
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[3] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
Li, Jinyu
Zhao, Rui
Hu, Hu
Gong, Yifan
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121
[4] EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
Miao, Yajie
Gowayyed, Mohammad
Metze, Florian
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 167 - 174
[5] End-to-End Speech Recognition of Tamil Language
Changrampadi, Mohamed Hashim
Shahina, A.
Narayanan, M. Badri
Khan, A. Nayeemulla
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
[6] Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Guo, Jinxi
Tiwari, Gautam
Droppo, Jasha
Van Segbroeck, Maarten
Huang, Che-Wei
Stolcke, Andreas
Maas, Roland
INTERSPEECH 2020, 2020, : 2807 - 2811
[7] Spelling-Aware Word-Based End-to-End ASR
Egorova, Ekaterina
Vydana, Hari Krishna
Burget, Lukas
Cernocky, Jan Honza
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1729 - 1733
[8] Sub-word Based End-to-End Speech Recognition for an Under-Resourced Language: Amharic
Gebreegziabher, Nirayo Hailu
Nuernberger, Andreas
2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3466 - 3470
[9] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
Pondel-Sycz, Karolina
Pietrzak, Agnieszka Paula
Szymla, Julia
INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
[10] Residual Language Model for End-to-end Speech Recognition
Tsunoo, Emiru
Kashiwagi, Yosuke
Narisetty, Chaitanya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3899 - 3903

← 1 2 3 4 5 →