Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion utilizing Complex Many-to-Many Alignments

被引:10
|
作者
Mousa, Amr El-Desoky [1 ]
Schuller, Bjoern [1 ,2 ]
机构
[1] Univ Passau, Chair Complex & Intelligent Syst, Passau, Germany
[2] Imperial Coll London, Dept Comp, London, England
基金
欧盟地平线“2020”;
关键词
grapheme-to-phoneme conversion; long short-term memory; many-to-many alignments;
D O I
10.21437/Interspeech.2016-1229
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Efficient grapheme-to-phoneme (G2P) conversion models are considered indispensable components to achieve the state-of-the-art performance in modem automatic speech recognition (ASR) and text-to-speech (TTS) systems. The role of these models is to provide such systems with a means to generate accurate pronunciations for unseen words. Recent work in this domain is based on recurrent neural networks (RNN) that are capable of translating grapheme sequences into phoneme sequences taking into account the full context of graphemes. To achieve high performance with these models, utilizing explicit alignment information is found essential. The quality of the G2P model heavily depends on the imposed alignment constraints. In this paper, a novel approach is proposed using complex many-to-many G2P alignments to improve the performance of G2P models based on deep bidirectional long short-term memory (BLSTM) RNNs. Extensive experiments cover models with different numbers of hidden layers, projection layer, input splicing windows, and varying alignment schemes. One observes that complex alignments significantly improve the performance on the publicly available CMUDict US English dataset. We compare our results with previously published results.
引用
收藏
页码:2836 / 2840
页数:5
相关论文
共 50 条
  • [1] GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
    Rao, Kanishka
    Peng, Fuchun
    Sak, Hasim
    Beaufays, Francoise
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4225 - 4229
  • [2] VOICE CONVERSION USING DEEP BIDIRECTIONAL LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS
    Sun, Lifa
    Kang, Shiyin
    Li, Kun
    Meng, Helen
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4869 - 4873
  • [3] Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks
    Hanson, Jack
    Yang, Yuedong
    Paliwal, Kuldip
    Zhou, Yaoqi
    [J]. BIOINFORMATICS, 2017, 33 (05) : 685 - 692
  • [4] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [5] Multimodal Dimensional Affect Recognition Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks
    Pei, Ercheng
    Yang, Le
    Jiang, Dongmei
    Sahli, Hichem
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 208 - 214
  • [6] On extended long short-term memory and dependent bidirectional recurrent neural network
    Su, Yuanhang
    Kuo, C-C Jay
    [J]. NEUROCOMPUTING, 2019, 356 : 151 - 161
  • [7] Long short-term memory-based deep recurrent neural networks for target tracking
    Gao, Chang
    Yan, Junkun
    Zhou, Shenghua
    Varshney, Pramod K.
    Liu, Hongwei
    [J]. INFORMATION SCIENCES, 2019, 502 : 279 - 296
  • [8] On Speaker Adaptation of Long Short-Term Memory Recurrent Neural Networks
    Miao, Yajie
    Metze, Florian
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1101 - 1105
  • [9] Articulatory Movement Prediction Using Deep Bidirectional Long Short-Term Memory Based Recurrent Neural Networks and Word/Phone Embeddings
    Zhu, Pengcheng
    Xie, Lei
    Chen, Yunlin
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2192 - 2196
  • [10] Comparing Long Short-Term Memory (LSTM) and bidirectional LSTM deep neural networks for power consumption prediction
    da Silva, Davi Guimaraes
    Meneses, Anderson Alvarenga de Moura
    [J]. ENERGY REPORTS, 2023, 10 : 3315 - 3334