Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

被引:11
|
作者
van Esch, Daan [1 ]
Chua, Mason [1 ]
Rao, Kanishka [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
LSTM; pronunciation; syllabification; stress;
D O I
10.21437/Interspeech.2016-1419
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Word pronunciations, consisting of phoneme sequences and the associated syllabification and stress patterns, are vital for both speech recognition and text-to-speech (TTS) systems. For speech recognition phoneme sequences for words may be learned from audio data. We train recurrent neural network (RNN) based models to predict the syllabification and stress pattern for such pronunciations making them usable for TTS. We find these RNN models significantly outperform naive rule based models for almost all languages we tested. Further, we find additional improvements to the stress prediction model by using the spelling as features in addition to the phoneme sequence. Finally, we train a single RNN model to predict the phoneme sequence, syllabification and stress for a given word. For several languages, this single RNN outperforms similar models trained specifically for either phoneme sequence or stress prediction. We report an exhaustive comparison of these approaches for twenty languages.
引用
收藏
页码:2841 / 2845
页数:5
相关论文
共 50 条
  • [1] Learning word pronunciations - Using a recurrent neural network
    Radio, MJ
    Reggia, JA
    Berndt, RS
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 11 - 15
  • [2] Predicting Human Behaviour with Recurrent Neural Networks
    Almeida, Aitor
    Azkune, Gorka
    APPLIED SCIENCES-BASEL, 2018, 8 (02):
  • [3] Predicting Opinions in Social Networks Using Recurrent Neural Networks
    Zareer, Mohamed N.
    Selmic, Rastko R.
    2023 31ST MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, MED, 2023, : 753 - 758
  • [4] Predicting Wind Energy Generation with Recurrent Neural Networks
    Manero, Jaume
    Bejar, Javier
    Cortes, Ulises
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 89 - 98
  • [5] Automatic generation of multiple pronunciations based on neural networks
    Fukada, T
    Yoshimura, T
    Sagisaka, Y
    SPEECH COMMUNICATION, 1999, 27 (01) : 63 - 73
  • [6] Predicting Local Field Potentials with Recurrent Neural Networks
    Kim, Louis
    Harer, Jacob
    Rangamani, Akshay
    Moran, James
    Parks, Philip D.
    Widge, Alik
    Eskandar, Emad
    Dougherty, Darin
    Chin, Sang
    2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 808 - 811
  • [7] Predicting Question Quality Using Recurrent Neural Networks
    Ruseti, Stefan
    Dascalu, Mihai
    Johnson, Amy M.
    Balyan, Renu
    Kopp, Kristopher J.
    McNamara, Danielle S.
    Crossley, Scott A.
    Trausan-Matu, Stefan
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PART I, 2018, 10947 : 491 - 502
  • [8] Predicting Smartphone App Usage with Recurrent Neural Networks
    Xu, Shijian
    Li, Wenzhong
    Zhang, Xiao
    Gao, Songcheng
    Zhan, Tong
    Zhao, Yongzhu
    Zhu, Wei-wei
    Sun, Tianzi
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2018), 2018, 10874 : 532 - 544
  • [9] Predicting systemic financial crises with recurrent neural networks
    Tolo, Eero
    JOURNAL OF FINANCIAL STABILITY, 2020, 49
  • [10] Predicting Dangerous Seismic Activity with Recurrent Neural Networks
    Kurach, Karol
    Pawlowski, Krzysztof
    PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 : 239 - 243