ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS

被引:57
|
作者
VANSANTEN, JPH
机构
[1] AT and T Bell Lab., Murray HIll, NJ
来源
COMPUTER SPEECH AND LANGUAGE | 1994年 / 8卷 / 02期
关键词
D O I
10.1006/csla.1994.1005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In natural speech, durations of phonetic segments are strongly dependent on contextual factors. For synthetic speech to sound natural, the module for computing segmental duration (the duration system) must mimic these contextual effects as closely as possible. Construction of a duration system is obstructed by two facets of segmental duration: (1) interactions between contextual factors, and (2) sparsity of training data. This paper describes a new duration system in which a central role is played by duration models, in the form of equations consisting of sums and products such as in: duration (/i/, voiced, stressed) = A(/i/) + B(voiced) x C(stressed). These models, which we call sums-of-products models, can capture the types of interaction patterns often found in duration data, where one factor typically amplifies-but does not reverse-the effects of other factors. Yet, these models are mathematically sufficiently tractable for robust parameter estimation in the presence of severe sparsity. The overall architecture of the system consists of a category structure, or tree, that divides the space into similar-behaved cases; for each of these categories a separate sums-of-products model is developed and its parameters are estimated. Perceptual evaluation results are reported for an implementation in the AT&T Bell Laboratories text-to-speech system.
引用
收藏
页码:95 / 128
页数:34
相关论文
共 50 条
  • [31] Database processing for Spanish text-to-speech synthesis
    Gómez-Mena, J
    Cardo, M
    Madrid, JL
    Prades, C
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 248 - 252
  • [32] Statistical Text-to-Speech Synthesis with Improved Dynamics
    Tiomkin, Stas
    Malah, David
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1841 - 1844
  • [33] A stochastic model of intonation for text-to-speech synthesis
    Véronis, J
    Di Cristo, P
    Courtois, F
    Chaumette, C
    [J]. SPEECH COMMUNICATION, 1998, 26 (04) : 233 - 244
  • [34] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
    Lu, Heng
    King, Simon
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853
  • [35] A single chip solution for text-to-speech synthesis
    Aktan, O
    Baskaya, IF
    Dündar, G
    [J]. Proceedings of the 2005 European Conference on Circuit Theory and Design, Vol 3, 2005, : 449 - 452
  • [36] Accented Text-to-Speech Synthesis With Limited Data
    Zhou, Xuehao
    Zhang, Mingyang
    Zhou, Yi
    Wu, Zhizheng
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1699 - 1711
  • [37] Statistical Text-to-Speech Synthesis of Spanish Subtitles
    Piqueras, S.
    del-Agua, M. A.
    Gimenez, A.
    Civera, J.
    Juan, A.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 40 - 48
  • [38] CHARACTERIZATION OF RHYTHMIC PATTERNS FOR TEXT-TO-SPEECH SYNTHESIS
    BARBOSA, P
    BAILLY, G
    [J]. SPEECH COMMUNICATION, 1994, 15 (1-2) : 127 - 137
  • [39] A Generalized LR parser for text-to-speech synthesis
    Heggtveit, PO
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1429 - 1432
  • [40] THE SYNTHESIS RULES IN A CHINESE TEXT-TO-SPEECH SYSTEM
    LEE, LS
    TSENG, CY
    MING, OY
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (09): : 1309 - 1320