A hybrid model for text-to-speech synthesis

被引:3
|
作者
Violaro, F [1 ]
Boeffard, O [1 ]
机构
[1] UNICAMP, FEEC, DECOM, BR-13083970 Campinas, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
prosodic modifications; speech synthesis;
D O I
10.1109/89.709668
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies multiple of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures better performance than the time-domain pitch-synchronous overlap-add (TD-PSOLA) model.
引用
收藏
页码:426 / 434
页数:9
相关论文
共 50 条
  • [1] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [2] A prosodic model for text-to-speech synthesis in French
    Di Cristo, A
    Di Cristo, P
    Campione, E
    Véronis, J
    [J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
  • [3] A stochastic model of intonation for text-to-speech synthesis
    Véronis, J
    Di Cristo, P
    Courtois, F
    Chaumette, C
    [J]. SPEECH COMMUNICATION, 1998, 26 (04) : 233 - 244
  • [4] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [5] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [6] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
  • [7] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [8] Issues in text-to-speech synthesis
    Macchi, M
    [J]. IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
  • [9] An efficient model for text-to-speech synthesis in Indian languages
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (03) : 305 - 315
  • [10] A superposed prosodic model for Chinese text-to-speech synthesis
    Chen, GP
    Bailly, G
    Liu, QF
    Wang, RH
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180