A hybrid model for text-to-speech synthesis

被引：3

作者：

Violaro, F ^{[1
]}

Boeffard, O ^{[1
]}

机构：

[1] UNICAMP, FEEC, DECOM, BR-13083970 Campinas, Brazil

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1998年 / 6卷 / 05期

基金：

巴西圣保罗研究基金会;

关键词：

prosodic modifications; speech synthesis;

D O I：

10.1109/89.709668

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies multiple of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures better performance than the time-domain pitch-synchronous overlap-add (TD-PSOLA) model.

引用

页码：426 / 434

页数：9

共 50 条

[21] Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis
Nurk, Tonis
[J]. HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 162 - 168
[22] PHONETIC KNOWLEDGE IN TEXT-TO-SPEECH SYNTHESIS
van Santen, Jan P. H.
[J]. INTEGRATION OF PHONETIC KNOWLEDGE IN SPEECH TECHNOLOGY, 2005, 25 : 149 - 166
[23] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
Sreelekshmi, K. S.
Gopinath, Deepa P.
[J]. 2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
[24] Slovenian Text-to-Speech Synthesis for Speech User Interfaces
Gros, Jerneja Zganec
Mihelic, Ales
Pavesic, Nikola
Zganec, Mario
Gruden, Stanislav
[J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 5, 2005, 5 : 216 - 220
[25] A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions
Adiga, Nagaraj
Prasanna, S. R. Mahadeva
[J]. 2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,
[26] Lexical stress assignment model for the Slovenian text-to-speech synthesis system
Sef, T
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 683 - 686
[27] A computational model of intonation for Yoruba text-to-speech synthesis:: Design and analysis
Odéjobí, OA
Beaumont, AJ
Wong, SHS
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 409 - 416
[28] Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis
Tian, Jilei
Nurminen, Jani
Kiss, Imre
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 313 - 316
[29] Myanmar Text-to-Speech Synthesis Using End-to-End Model
Qin, Qinglai
Yang, Jian
Li, Peiying
[J]. 2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 6 - 11
[30] A complete text-to-speech synthesis system in Tamil
Rama, GLJ
Ramakrishnan, AG
Muralishankar, R
Prathibha, R
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 191 - 194

← 1 2 3 4 5 →