Parameter selection for prosodic modelling in a restricted-domain Spanish text-to-speech system

被引：0

作者：

Montero, JM ^{[1
]}

de Córdoba, R ^{[1
]}

Macías-Guarasa, J ^{[1
]}

San-Segundo, R ^{[1
]}

Gutiérrez-Arriola, J ^{[1
]}

Pardo, JM ^{[1
]}

机构：

[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomun, E-28040 Madrid, Spain

来源：

Image Processing, Biomedicine, Multimedia, Financial Engineering and Manufacturing, Vol 18 | 2004年 / 18卷

关键词：

Prosody; FO modeling; duration modeling; text-to-speech; artificial neural networks; paranieter selection; parameter coding;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the FO curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system. To achieve these complementary aims we needed to find the factors that most influence prosodic values ill a given language. We have used neural networks and experimented with the different combinations of parameters that yield the minimum error in the estimation. In the restricted-domain environment the variation in the different patterns is reduced, and there are more instances of each parameter vector in the database. This way, the neural network proves to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves Our previous rule-based system.

引用

页码：93 / 98

页数：6

共 50 条

[41] Database processing for Spanish text-to-speech synthesis
Gómez-Mena, J
Cardo, M
Madrid, JL
Prades, C
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 248 - 252
[42] Application of neural networks to duration modeling in a Spanish text-to-speech system
Córdoba, R.
Montero, J.M.
Pardo, J.M.
Advances in Systems Engineering, Signal Processing and Communications, 2002, : 244 - 247
[43] Statistical Text-to-Speech Synthesis of Spanish Subtitles
Piqueras, S.
del-Agua, M. A.
Gimenez, A.
Civera, J.
Juan, A.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 40 - 48
[44] Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System
Fernandez, Raul
Rendel, Asaf
Ramabhadran, Bhuvana
Hoory, Ron
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1606 - 1610
[45] Text analysis for the Slovenian text-to-speech system
Sef, T
ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 1355 - 1358
[46] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
Lu, Heng
King, Simon
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853
[47] Text normalization in mandarin Text-to-Speech system
Jia, Yuxiang
Huang, Dezhi
Liu, Wu
Dong, Yuan
Yu, Shiwen
Wang, Haila
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4693 - +
[48] A Polyglot Domain Optimised Text-To-Speech System for Railway Station Announcements
Zainko, Csaba
Bartalis, Matyas
Nemeth, Geza
Olaszy, Gabor
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1236 - 1240
[49] Controllable neural text-to-speech synthesis using intuitive prosodic features
Raitio, Tuomo
Rasipuram, Ramya
Castellani, Dan
INTERSPEECH 2020, 2020, : 4432 - 4436
[50] Prosodic rules for schwa-deletion in hindi text-to-speech synthesis
Tyson, Na'im
Nagar, Ila
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2009, 12 (01) : 15 - 25

← 1 2 3 4 5 →