Parameter selection for prosodic modelling in a restricted-domain Spanish text-to-speech system

被引:0
|
作者
Montero, JM [1 ]
de Córdoba, R [1 ]
Macías-Guarasa, J [1 ]
San-Segundo, R [1 ]
Gutiérrez-Arriola, J [1 ]
Pardo, JM [1 ]
机构
[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomun, E-28040 Madrid, Spain
关键词
Prosody; FO modeling; duration modeling; text-to-speech; artificial neural networks; paranieter selection; parameter coding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the FO curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system. To achieve these complementary aims we needed to find the factors that most influence prosodic values ill a given language. We have used neural networks and experimented with the different combinations of parameters that yield the minimum error in the estimation. In the restricted-domain environment the variation in the different patterns is reduced, and there are more instances of each parameter vector in the database. This way, the neural network proves to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves Our previous rule-based system.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 50 条
  • [21] Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech
    Nguyen Thi Thu Trang
    Nguyen Hoang Ky
    Rilliard, Albert
    d'Alessandro, Christophe
    INTERSPEECH 2021, 2021, : 3885 - 3889
  • [22] Modelling speech temporal structure for Estonian text-to-speech synthesis: Feature selection
    Mihkla, Meelis
    TRAMES-JOURNAL OF THE HUMANITIES AND SOCIAL SCIENCES, 2007, 11 (03): : 284 - 298
  • [23] Prosodic reading style simulation for text-to-speech synthesis
    Jokisch, O
    Kruschke, H
    Hoffmann, R
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 426 - 432
  • [24] An efficient Mandarin text-to-speech system on time domain
    Lin, YJ
    Yu, MS
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1998, E81D (06): : 545 - 555
  • [25] Instrumental Assessment of Prosodic Quality for Text-to-Speech Signals
    Norrenbrock, Christoph R.
    Hinterleitner, Florian
    Heute, Ulrich
    Moeller, Sebastian
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (05) : 255 - 258
  • [26] A superposed prosodic model for Chinese text-to-speech synthesis
    Chen, GP
    Bailly, G
    Liu, QF
    Wang, RH
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180
  • [27] Diphone Spanish Text-to-Speech Synthesizer
    Rybarova, Renata
    del Corral, Gonzalo
    Rozinaj, Gregor
    2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 121 - 124
  • [28] Enhanced quality text-to-speech for restricted domains
    不详
    BELL LABS TECHNICAL JOURNAL, 1997, 2 (04) : 169 - 170
  • [29] Automatic prosodic modeling for speaker and task adaptation in text-to-speech
    LopezGonzalo, E
    RodriguezGarcia, JM
    HernandezGomez, L
    Villar, JM
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 927 - 930
  • [30] PROSODIC REPRESENTATION LEARNING AND CONTEXTUAL SAMPLING FOR NEURAL TEXT-TO-SPEECH
    Karlapati, Sri
    Abbas, Ammar
    Hodari, Zack
    Moinet, Alexis
    Joly, Arnaud
    Karanasou, Penny
    Drugman, Thomas
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6573 - 6577