GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS

被引:0
|
作者
Rao, Kanishka [1 ]
Peng, Fuchun [1 ]
Sak, Hasim [1 ]
Beaufays, Francoise [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
speech recognition; pronunciation; RNN; LSTM; G2P; CTC;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit graphemeto-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.
引用
收藏
页码:4225 / 4229
页数:5
相关论文
共 50 条
  • [1] Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion utilizing Complex Many-to-Many Alignments
    Mousa, Amr El-Desoky
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2836 - 2840
  • [2] LOW-RESOURCE GRAPHEME-TO-PHONEME CONVERSION USING RECURRENT NEURAL NETWORKS
    Jyothi, Preethi
    Hasegawa-Johnson, Mark
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5030 - 5034
  • [3] Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
    Yolchuyeva, Sevinj
    Nemeth, Geza
    Gyires-Toth, Balint
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (06):
  • [4] Grapheme-to-Phoneme Conversion for Thai using Neural Regression Models
    Yamasaki, Tomohiro
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4251 - 4255
  • [5] Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion
    Sokolov, Alex
    Rohlin, Tracy
    Rastrow, Ariya
    [J]. INTERSPEECH 2019, 2019, : 2065 - 2069
  • [6] NEURAL GRAPHEME-TO-PHONEME CONVERSION WITH PRE-TRAINED GRAPHEME MODELS
    Dong, Lu
    Guo, Zhi-Qiang
    Tan, Chao-Hong
    Hu, Ya-Jun
    Jiang, Yuan
    Ling, Zhen-Hua
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6202 - 6206
  • [7] Grapheme-to-Phoneme Conversion using Conditional Random Fields
    Illina, Irina
    Fohr, Dominique
    Jouvet, Denis
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2324 - 2327
  • [8] VOICE CONVERSION USING DEEP BIDIRECTIONAL LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS
    Sun, Lifa
    Kang, Shiyin
    Li, Kun
    Meng, Helen
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4869 - 4873
  • [9] IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES
    Niranjan, Abhishek
    Shaik, M. Ali Basha
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 442 - 448
  • [10] Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion
    Yao, Kaisheng
    Zweig, Geoffrey
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3330 - 3334