GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS

被引：0

作者：

Rao, Kanishka ^{[1
]}

Peng, Fuchun ^{[1
]}

Sak, Hasim ^{[1
]}

Beaufays, Francoise ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

speech recognition; pronunciation; RNN; LSTM; G2P; CTC;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit graphemeto-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.

引用

页码：4225 / 4229

页数：5

共 50 条

[1] Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion utilizing Complex Many-to-Many Alignments
Mousa, Amr El-Desoky
Schuller, Bjoern
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2836 - 2840
[2] LOW-RESOURCE GRAPHEME-TO-PHONEME CONVERSION USING RECURRENT NEURAL NETWORKS
Jyothi, Preethi
Hasegawa-Johnson, Mark
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5030 - 5034
[3] Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
Yolchuyeva, Sevinj
Nemeth, Geza
Gyires-Toth, Balint
[J]. APPLIED SCIENCES-BASEL, 2019, 9 (06):
[4] Grapheme-to-Phoneme Conversion for Thai using Neural Regression Models
Yamasaki, Tomohiro
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4251 - 4255
[5] Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion
Sokolov, Alex
Rohlin, Tracy
Rastrow, Ariya
[J]. INTERSPEECH 2019, 2019, : 2065 - 2069
[6] NEURAL GRAPHEME-TO-PHONEME CONVERSION WITH PRE-TRAINED GRAPHEME MODELS
Dong, Lu
Guo, Zhi-Qiang
Tan, Chao-Hong
Hu, Ya-Jun
Jiang, Yuan
Ling, Zhen-Hua
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6202 - 6206
[7] Grapheme-to-Phoneme Conversion using Conditional Random Fields
Illina, Irina
Fohr, Dominique
Jouvet, Denis
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2324 - 2327
[8] VOICE CONVERSION USING DEEP BIDIRECTIONAL LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS
Sun, Lifa
Kang, Shiyin
Li, Kun
Meng, Helen
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4869 - 4873
[9] IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES
Niranjan, Abhishek
Shaik, M. Ali Basha
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 442 - 448
[10] Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion
Yao, Kaisheng
Zweig, Geoffrey
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3330 - 3334

← 1 2 3 4 5 →