Parameter selection for prosodic modelling in a restricted-domain Spanish text-to-speech system

被引:0
|
作者
Montero, JM [1 ]
de Córdoba, R [1 ]
Macías-Guarasa, J [1 ]
San-Segundo, R [1 ]
Gutiérrez-Arriola, J [1 ]
Pardo, JM [1 ]
机构
[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomun, E-28040 Madrid, Spain
关键词
Prosody; FO modeling; duration modeling; text-to-speech; artificial neural networks; paranieter selection; parameter coding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the FO curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system. To achieve these complementary aims we needed to find the factors that most influence prosodic values ill a given language. We have used neural networks and experimented with the different combinations of parameters that yield the minimum error in the estimation. In the restricted-domain environment the variation in the different patterns is reduced, and there are more instances of each parameter vector in the database. This way, the neural network proves to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves Our previous rule-based system.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 50 条
  • [31] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
    Liu, Bin
    Liu, Rui
    Li, Haizhou
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 326 - 337
  • [32] Derivation of prosody for text-to-speech from prosodic sentence structure
    Quene, Hugo
    Kager, Rene
    Computer Speech and Language, 1992, 6 (01): : 77 - 98
  • [33] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
    Tsiakoulis, Pirros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Raptis, Spyros
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
  • [34] Automatic conversion from lexical words to prosodic words for mandarin text-to-speech system
    Shao, Yanqiu
    Han, Jiqing
    Liu, Ting
    Zhao, Yongzhen
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2007, 10 (01) : 45 - 55
  • [35] A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis
    Magata, K
    Hamagami, T
    Komura, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1373 - 1376
  • [36] Slovenian text-to-speech system
    Sef, T
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 41 - 44
  • [37] A Hakka text-to-speech system
    Yu, Hsiu-Min
    Hwang, Hsin-Te
    Lin, Dong-Yi
    Chen, Sin-Horng
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 241 - +
  • [38] A TEXT-TO-SPEECH CONVERSION SYSTEM
    KLATT, DH
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1982, 184 (SEP): : 11 - CINF
  • [39] Text-to-speech system for Danish
    1600, Publ by Elsevier Science Publishers B.V., Amsterdam, Neth
  • [40] A Mandarin text-to-speech system
    Hwang, SH
    Chen, SH
    Wang, YR
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1421 - 1424