ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS

被引:57
|
作者
VANSANTEN, JPH
机构
[1] AT and T Bell Lab., Murray HIll, NJ
来源
COMPUTER SPEECH AND LANGUAGE | 1994年 / 8卷 / 02期
关键词
D O I
10.1006/csla.1994.1005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In natural speech, durations of phonetic segments are strongly dependent on contextual factors. For synthetic speech to sound natural, the module for computing segmental duration (the duration system) must mimic these contextual effects as closely as possible. Construction of a duration system is obstructed by two facets of segmental duration: (1) interactions between contextual factors, and (2) sparsity of training data. This paper describes a new duration system in which a central role is played by duration models, in the form of equations consisting of sums and products such as in: duration (/i/, voiced, stressed) = A(/i/) + B(voiced) x C(stressed). These models, which we call sums-of-products models, can capture the types of interaction patterns often found in duration data, where one factor typically amplifies-but does not reverse-the effects of other factors. Yet, these models are mathematically sufficiently tractable for robust parameter estimation in the presence of severe sparsity. The overall architecture of the system consists of a category structure, or tree, that divides the space into similar-behaved cases; for each of these categories a separate sums-of-products model is developed and its parameters are estimated. Perceptual evaluation results are reported for an implementation in the AT&T Bell Laboratories text-to-speech system.
引用
收藏
页码:95 / 128
页数:34
相关论文
共 50 条
  • [1] Modeling segmental duration in German text-to-speech synthesis
    Mobius, B
    vanSanten, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2395 - 2398
  • [2] Modeling segmental duration for Turkish text-to-speech
    Öztürk, Ö
    Çiloglu, T
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 272 - 275
  • [3] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
    Sreelekshmi, K. S.
    Gopinath, Deepa P.
    [J]. 2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
  • [4] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [5] Optimal state duration assignment in hidden Markov model-based text-to-speech synthesis system
    Khan, Najeeb Ullah
    Lee, Jung-Chul
    [J]. ELECTRONICS LETTERS, 2015, 51 (12) : 941 - 942
  • [6] Segmental intelligibility of four currently used text-to-speech synthesis methods
    Venkatagiri, HS
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 113 (04): : 2095 - 2104
  • [7] Lexical stress assignment model for the Slovenian text-to-speech synthesis system
    Sef, T
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 683 - 686
  • [8] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [9] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [10] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323