A hybrid model for text-to-speech synthesis

被引:3
|
作者
Violaro, F [1 ]
Boeffard, O [1 ]
机构
[1] UNICAMP, FEEC, DECOM, BR-13083970 Campinas, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
prosodic modifications; speech synthesis;
D O I
10.1109/89.709668
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies multiple of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures better performance than the time-domain pitch-synchronous overlap-add (TD-PSOLA) model.
引用
收藏
页码:426 / 434
页数:9
相关论文
共 50 条
  • [21] Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis
    Nurk, Tonis
    [J]. HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 162 - 168
  • [22] PHONETIC KNOWLEDGE IN TEXT-TO-SPEECH SYNTHESIS
    van Santen, Jan P. H.
    [J]. INTEGRATION OF PHONETIC KNOWLEDGE IN SPEECH TECHNOLOGY, 2005, 25 : 149 - 166
  • [23] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
    Sreelekshmi, K. S.
    Gopinath, Deepa P.
    [J]. 2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
  • [24] Slovenian Text-to-Speech Synthesis for Speech User Interfaces
    Gros, Jerneja Zganec
    Mihelic, Ales
    Pavesic, Nikola
    Zganec, Mario
    Gruden, Stanislav
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 5, 2005, 5 : 216 - 220
  • [25] A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions
    Adiga, Nagaraj
    Prasanna, S. R. Mahadeva
    [J]. 2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,
  • [26] Lexical stress assignment model for the Slovenian text-to-speech synthesis system
    Sef, T
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 683 - 686
  • [27] A computational model of intonation for Yoruba text-to-speech synthesis:: Design and analysis
    Odéjobí, OA
    Beaumont, AJ
    Wong, SHS
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 409 - 416
  • [28] Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis
    Tian, Jilei
    Nurminen, Jani
    Kiss, Imre
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 313 - 316
  • [29] Myanmar Text-to-Speech Synthesis Using End-to-End Model
    Qin, Qinglai
    Yang, Jian
    Li, Peiying
    [J]. 2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 6 - 11
  • [30] A complete text-to-speech synthesis system in Tamil
    Rama, GLJ
    Ramakrishnan, AG
    Muralishankar, R
    Prathibha, R
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 191 - 194