ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS

被引:57
|
作者
VANSANTEN, JPH
机构
[1] AT and T Bell Lab., Murray HIll, NJ
来源
COMPUTER SPEECH AND LANGUAGE | 1994年 / 8卷 / 02期
关键词
D O I
10.1006/csla.1994.1005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In natural speech, durations of phonetic segments are strongly dependent on contextual factors. For synthetic speech to sound natural, the module for computing segmental duration (the duration system) must mimic these contextual effects as closely as possible. Construction of a duration system is obstructed by two facets of segmental duration: (1) interactions between contextual factors, and (2) sparsity of training data. This paper describes a new duration system in which a central role is played by duration models, in the form of equations consisting of sums and products such as in: duration (/i/, voiced, stressed) = A(/i/) + B(voiced) x C(stressed). These models, which we call sums-of-products models, can capture the types of interaction patterns often found in duration data, where one factor typically amplifies-but does not reverse-the effects of other factors. Yet, these models are mathematically sufficiently tractable for robust parameter estimation in the presence of severe sparsity. The overall architecture of the system consists of a category structure, or tree, that divides the space into similar-behaved cases; for each of these categories a separate sums-of-products model is developed and its parameters are estimated. Perceptual evaluation results are reported for an implementation in the AT&T Bell Laboratories text-to-speech system.
引用
收藏
页码:95 / 128
页数:34
相关论文
共 50 条
  • [41] Text-To-Speech Synthesis System for Punjabi Language
    Singh, Parminder
    Lehal, Gurpreet Singh
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 302 - 303
  • [42] A waveform concatenation technique for text-to-speech synthesis
    Panda S.P.
    Nayak A.K.
    [J]. International Journal of Speech Technology, 2017, 20 (4) : 959 - 976
  • [43] INTONATION IN TEXT-TO-SPEECH SYNTHESIS - EVALUATION OF ALGORITHMS
    AKERS, G
    LENNIG, M
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 77 (06): : 2157 - 2165
  • [44] IMPROVED POS TAGGING FOR TEXT-TO-SPEECH SYNTHESIS
    Sun, Ming
    Bellegarda, Jerome R.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5384 - 5387
  • [45] Slovak text-to-speech synthesis in ARTIC system
    Matousek, J
    Tihelka, D
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 155 - 162
  • [46] Development of Assamese Text-to-Speech Synthesis System
    Sharma, Bidisha
    Adiga, Nagaraj
    Prasanna, S. R. Mahadeva
    [J]. TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
  • [47] Text analysis and language identification for polyglot text-to-speech synthesis
    Romsdorfer, Harald
    Pfister, Beat
    [J]. SPEECH COMMUNICATION, 2007, 49 (09) : 697 - 724
  • [48] Speech synthesis for text-to-speech alignment and prosodic feature extraction
    Malfrere, F
    Dutoit, T
    [J]. ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 2637 - 2640
  • [49] Modeling and synthesizing emotional speech for Catalan text-to-speech synthesis
    Iriondo, I
    Alías, F
    Melenchón, J
    Llorca, MA
    [J]. AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS, 2004, 3068 : 197 - 208
  • [50] Strategies for developing a conversational speech dataset for Text-To-Speech Synthesis
    Adigwe, Adaeze O.
    Klabbers, Esther
    [J]. INTERSPEECH 2022, 2022, : 2318 - 2322