GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS

被引:0
|
作者
Rao, Kanishka [1 ]
Peng, Fuchun [1 ]
Sak, Hasim [1 ]
Beaufays, Francoise [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
speech recognition; pronunciation; RNN; LSTM; G2P; CTC;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit graphemeto-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.
引用
收藏
页码:4225 / 4229
页数:5
相关论文
共 50 条
  • [21] Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks
    Abbas, Zainab
    Al-Shishtawy, Ahmad
    Girdzijauskas, Sarunas
    Vlassov, Vladimir
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 57 - 65
  • [22] Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
    Zazo, Ruben
    Lozano-Diez, Alicia
    Gonzalez-Dominguez, Javier
    Toledano, Doroteo T.
    Gonzalez-Rodriguez, Joaquin
    [J]. PLOS ONE, 2016, 11 (01):
  • [23] Using Ant Colony Optimization to Optimize Long Short-Term Memory Recurrent Neural Networks
    ElSaid, AbdElRahman
    El Jamiy, Fatima
    Higgins, James
    Wild, Brandon
    Desell, Travis
    [J]. GECCO'18: PROCEEDINGS OF THE 2018 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2018, : 13 - 20
  • [24] Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach
    Kheang, Seng
    Katsurada, Kouichi
    Iribe, Yurie
    Nitta, Tsuneo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (04): : 901 - 910
  • [25] Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks
    Kozhirbayev, Zhanibek
    Yessenbayev, Zhandos
    Karabalayeva, Muslima
    [J]. 2017 11TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2017), 2017, : 342 - 346
  • [26] Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks
    Lu, Yuzhen
    Salem, Fathi M.
    [J]. 2017 IEEE 60TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2017, : 1601 - 1604
  • [27] Detecting Overlapping Speech with Long Short-Term Memory Recurrent Neural Networks
    Geiger, Juergen T.
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1667 - 1671
  • [28] Long Short-Term Memory Recurrent Neural Networks for Antibacterial Peptide Identification
    Youmans, Michael
    Spainhour, Christian
    Qiu, Peng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 498 - 502
  • [29] Long Short-Term Memory Based Recurrent Neural Networks for Collaborative Filtering
    Zou, Lixin
    Gu, Yulong
    Song, Jiaxing
    Liu, Weidong
    Yao, Yuan
    [J]. 2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [30] Long and Short-Term Recommendations with Recurrent Neural Networks
    Devooght, Robin
    Bersini, Hugues
    [J]. PROCEEDINGS OF THE 25TH CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION (UMAP'17), 2017, : 13 - 21