A hybrid model for text-to-speech synthesis

被引:3
|
作者
Violaro, F [1 ]
Boeffard, O [1 ]
机构
[1] UNICAMP, FEEC, DECOM, BR-13083970 Campinas, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
prosodic modifications; speech synthesis;
D O I
10.1109/89.709668
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies multiple of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures better performance than the time-domain pitch-synchronous overlap-add (TD-PSOLA) model.
引用
收藏
页码:426 / 434
页数:9
相关论文
共 50 条
  • [31] Spectral voice conversion for text-to-speech synthesis
    Kain, A
    Macon, MW
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 285 - 288
  • [32] Paraphrase generation to improve Text-To-Speech Synthesis
    Putois, Ghislain
    Chevelu, Jonathan
    Boidin, Cedric
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 198 - 201
  • [33] Diphone Databases for Lithuanian text-to-speech synthesis
    Kasparaitis, P
    [J]. INFORMATICA, 2005, 16 (02) : 193 - 202
  • [34] Wavelet analysis used in text-to-speech synthesis
    Kobayashi, M
    Sakamoto, M
    Saito, T
    Hashimoto, Y
    Nishimura, M
    Suzuki, K
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1998, 45 (08): : 1125 - 1129
  • [35] RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
    Zandie, Rohola
    Mahoor, Mohammad H.
    Madsen, Julia
    Emamian, Eshrat S.
    [J]. INTERSPEECH 2021, 2021, : 2751 - 2755
  • [36] Text-to-speech synthesis with an Indian language perspective
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    Patnaik, Srikanta
    [J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2015, 6 (3-4) : 170 - 178
  • [37] ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS
    VANSANTEN, JPH
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 95 - 128
  • [38] Database processing for Spanish text-to-speech synthesis
    Gómez-Mena, J
    Cardo, M
    Madrid, JL
    Prades, C
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 248 - 252
  • [39] Statistical Text-to-Speech Synthesis with Improved Dynamics
    Tiomkin, Stas
    Malah, David
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1841 - 1844
  • [40] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
    Lu, Heng
    King, Simon
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853