A hybrid model for text-to-speech synthesis

被引：3

作者：

Violaro, F ^{[1
]}

Boeffard, O ^{[1
]}

机构：

[1] UNICAMP, FEEC, DECOM, BR-13083970 Campinas, Brazil

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1998年 / 6卷 / 05期

基金：

巴西圣保罗研究基金会;

关键词：

prosodic modifications; speech synthesis;

D O I：

10.1109/89.709668

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies multiple of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures better performance than the time-domain pitch-synchronous overlap-add (TD-PSOLA) model.

引用

页码：426 / 434

页数：9

共 50 条

[1] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
[2] A prosodic model for text-to-speech synthesis in French
Di Cristo, A
Di Cristo, P
Campione, E
Véronis, J
[J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
[3] A stochastic model of intonation for text-to-speech synthesis
Véronis, J
Di Cristo, P
Courtois, F
Chaumette, C
[J]. SPEECH COMMUNICATION, 1998, 26 (04) : 233 - 244
[4] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
Doukhan, David
Rosset, Sophie
Rilliard, Albert
d'Alessandro, Christophe
Adda-Decker, Martine
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
[5] Multilingual text-to-speech synthesis
Black, AW
Lenzo, KA
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
[6] An introduction to text-to-speech synthesis
Fitzpatrick, E
[J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
[7] Improving text-to-speech synthesis
Tatham, M
Lewis, E
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
[8] Issues in text-to-speech synthesis
Macchi, M
[J]. IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
[9] An efficient model for text-to-speech synthesis in Indian languages
Panda, Soumya Priyadarsini
Nayak, Ajit Kumar
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (03) : 305 - 315
[10] A superposed prosodic model for Chinese text-to-speech synthesis
Chen, GP
Bailly, G
Liu, QF
Wang, RH
[J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180

← 1 2 3 4 5 →