Parameter selection for prosodic modelling in a restricted-domain Spanish text-to-speech system

被引:0
|
作者
Montero, JM [1 ]
de Córdoba, R [1 ]
Macías-Guarasa, J [1 ]
San-Segundo, R [1 ]
Gutiérrez-Arriola, J [1 ]
Pardo, JM [1 ]
机构
[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomun, E-28040 Madrid, Spain
关键词
Prosody; FO modeling; duration modeling; text-to-speech; artificial neural networks; paranieter selection; parameter coding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the FO curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system. To achieve these complementary aims we needed to find the factors that most influence prosodic values ill a given language. We have used neural networks and experimented with the different combinations of parameters that yield the minimum error in the estimation. In the restricted-domain environment the variation in the different patterns is reduced, and there are more instances of each parameter vector in the database. This way, the neural network proves to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves Our previous rule-based system.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 50 条
  • [41] Database processing for Spanish text-to-speech synthesis
    Gómez-Mena, J
    Cardo, M
    Madrid, JL
    Prades, C
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 248 - 252
  • [42] Application of neural networks to duration modeling in a Spanish text-to-speech system
    Córdoba, R.
    Montero, J.M.
    Pardo, J.M.
    Advances in Systems Engineering, Signal Processing and Communications, 2002, : 244 - 247
  • [43] Statistical Text-to-Speech Synthesis of Spanish Subtitles
    Piqueras, S.
    del-Agua, M. A.
    Gimenez, A.
    Civera, J.
    Juan, A.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 40 - 48
  • [44] Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System
    Fernandez, Raul
    Rendel, Asaf
    Ramabhadran, Bhuvana
    Hoory, Ron
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1606 - 1610
  • [45] Text analysis for the Slovenian text-to-speech system
    Sef, T
    ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 1355 - 1358
  • [46] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
    Lu, Heng
    King, Simon
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853
  • [47] Text normalization in mandarin Text-to-Speech system
    Jia, Yuxiang
    Huang, Dezhi
    Liu, Wu
    Dong, Yuan
    Yu, Shiwen
    Wang, Haila
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4693 - +
  • [48] A Polyglot Domain Optimised Text-To-Speech System for Railway Station Announcements
    Zainko, Csaba
    Bartalis, Matyas
    Nemeth, Geza
    Olaszy, Gabor
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1236 - 1240
  • [49] Controllable neural text-to-speech synthesis using intuitive prosodic features
    Raitio, Tuomo
    Rasipuram, Ramya
    Castellani, Dan
    INTERSPEECH 2020, 2020, : 4432 - 4436
  • [50] Prosodic rules for schwa-deletion in hindi text-to-speech synthesis
    Tyson, Na'im
    Nagar, Ila
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2009, 12 (01) : 15 - 25