Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis

被引:39
|
作者
Wu, CH [1 ]
Chen, JH [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
Chinese text-to-speech conversion; synthesis units; prosodic information; concatenative synthesis; pitch contour; syllable duration;
D O I
10.1016/S0167-6393(00)00075-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, some approaches to the generation of synthesis units and prosodic information are proposed for Mandarin Chinese text-to-speech (TTS) conversion. The monosyllables are adopted as the basic synthesis units. A. set of synthesis units is selected from a large continuous speech database based on two cost functions, which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a phrase. This template tree stores them prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results showed that the synthesized prosodic features matched quite well with their original counterparts. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:219 / 237
页数:19
相关论文
共 50 条
  • [41] Subspace and hypothesis based effective segmentation of co-articulated basic-units for concatenative speech synthesis
    Muralishankar, R
    Srikanth, R
    Ramakrishnan, AG
    [J]. IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 388 - 392
  • [42] Context-adaptive smoothing for concatenative speech synthesis
    Lee, KS
    Kim, SR
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (12) : 422 - 425
  • [43] Synthesis and generation of fractal information signals
    Bolotov, VN
    Tkach, YV
    Tkach, YY
    [J]. 12TH INTERNATIONAL CONFERENCE - MICROWAVE & TELECOMMUNICATION TECHNOLOGY, CONFERENCE PROCEEDINGS, 2002, : 247 - 248
  • [44] Head movement synthesis based on semantic and prosodic features for a chinese expressive avatar
    Zhang, Shen
    Wu, Zhiyong
    Meng, Helen M.
    Cai, Lianhong
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 837 - +
  • [45] SELF-ATTENTION BASED PROSODIC BOUNDARY PREDICTION FOR CHINESE SPEECH SYNTHESIS
    Lu, Chunhui
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7035 - 7039
  • [46] The phase substitutions in Czech harmonic concatenative speech synthesis
    Tychtl, Z
    Matous, K
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 333 - 340
  • [47] SYNTHESIS BY RULE OF PROSODIC FEATURES
    MATTINGL.IG
    [J]. LANGUAGE AND SPEECH, 1966, 9 : 1 - &
  • [48] Syllable Based Concatenative Synthesis for Text to Speech Conversion
    Ananthi, S.
    Dhanalakshmi, P.
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 3, 2015, 33
  • [49] SYNTHESIS BY RULE OF PROSODIC FEATURES IN WORD CONCATENATION SYNTHESIS
    YOUNG, SJ
    FALLSIDE, F
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1980, 12 (03): : 241 - 258
  • [50] Spectral dynamics as a source of discontinuity in concatenative speech synthesis
    Kirkpatrick, Barry
    O'Brien, Darragh
    Scaife, Ronan
    Errity, Andrew
    [J]. PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 615 - +