Parameter selection for prosodic modelling in a restricted-domain Spanish text-to-speech system

被引:0
|
作者
Montero, JM [1 ]
de Córdoba, R [1 ]
Macías-Guarasa, J [1 ]
San-Segundo, R [1 ]
Gutiérrez-Arriola, J [1 ]
Pardo, JM [1 ]
机构
[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomun, E-28040 Madrid, Spain
关键词
Prosody; FO modeling; duration modeling; text-to-speech; artificial neural networks; paranieter selection; parameter coding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the FO curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system. To achieve these complementary aims we needed to find the factors that most influence prosodic values ill a given language. We have used neural networks and experimented with the different combinations of parameters that yield the minimum error in the estimation. In the restricted-domain environment the variation in the different patterns is reduced, and there are more instances of each parameter vector in the database. This way, the neural network proves to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves Our previous rule-based system.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 50 条
  • [1] A Prosodic Text-to-Speech System for Yoruba Language
    Akinwonmi, Akintoba Emmanuel
    Alese, Boniface Kayode
    2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 630 - 635
  • [2] Prosodic annotation in a Thai Text-to-speech system
    Department of Electrical and Computer Engineering, Citadel, Military College of South Carolina, 171 Moultrie Street, Charleston, SC 29409, United States
    PACLIC - Pacific Asia Conf. Lang., Inf. Comput., Proc., 2007, (405-414):
  • [3] Prosodic Annotation in a Thai Text-to-speech System
    Potisuk, Siripong
    PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 405 - 414
  • [4] TIME-DOMAIN PROSODIC MODIFICATIONS FOR TEXT-TO-SPEECH SYNTHESIZER
    Lopatka, Kuba
    Suchomski, Piotr
    Czyzewski, Andrzej
    SPA 2010: SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS CONFERENCE PROCEEDINGS, 2010, : 73 - 77
  • [5] IMPLEMENTING PROSODIC PHRASING FOR AN EXPERIMENTAL TEXT-TO-SPEECH SYSTEM
    BACHENKO, J
    FITZPATRICK, E
    LACY, J
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S79 - S79
  • [6] A Prosodic Mandarin Text-to-Speech System Based on Tacotron
    Zhang, Chuxiong
    Zhang, Sheng
    Zhong, Haibing
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 165 - 169
  • [7] A prosodic Turkish text-to-speech synthesizer
    Vural, E
    Oflazer, K
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 458 - 460
  • [8] Selection of the most significant parameters for duration modelling in a Spanish text-to-speech system using neural networks
    Córdoba, R
    Montero, JM
    Gutiérrez, JM
    Vallejo, JA
    Enriquez, E
    Pardo, JM
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 183 - 203
  • [9] A prosodic diphone database for Korean text-to-speech synthesis system
    Yoon, K
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 425 - 428
  • [10] TEXT-TO-SPEECH CONVERSION SYSTEM TO DEVELOP PROSODIC RULES.
    Mikuni, Ichiro
    Ohta, Kozo
    Denshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory, 1988, 52 (03): : 82 - 87