Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis

被引:39
|
作者
Wu, CH [1 ]
Chen, JH [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
Chinese text-to-speech conversion; synthesis units; prosodic information; concatenative synthesis; pitch contour; syllable duration;
D O I
10.1016/S0167-6393(00)00075-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, some approaches to the generation of synthesis units and prosodic information are proposed for Mandarin Chinese text-to-speech (TTS) conversion. The monosyllables are adopted as the basic synthesis units. A. set of synthesis units is selected from a large continuous speech database based on two cost functions, which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a phrase. This template tree stores them prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results showed that the synthesized prosodic features matched quite well with their original counterparts. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:219 / 237
页数:19
相关论文
共 50 条
  • [1] Template-driven generation of prosodic information for Chinese concatenative synthesis
    Wu, CH
    Chen, JH
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 65 - 68
  • [2] SET OF CONCATENATIVE UNITS FOR SPEECH SYNTHESIS
    OLIVE, J
    LIBERMAN, M
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S130 - S130
  • [3] Automatic Labeling Schemes for Concatenative Speech Synthesis
    Kacur, Juraj
    Cepko, Jozef
    Palenik, Andrej
    [J]. PROCEEDINGS ELMAR-2008, VOLS 1 AND 2, 2008, : 639 - 642
  • [4] Automatic generation of prosodic structure for high quality Mandarin speech synthesis
    Chou, FC
    Tseng, CY
    Lee, LS
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1624 - 1627
  • [5] Automatic segmentation for construction of signal dictionary in concatenative synthesis
    Chowdhury, S
    Datta, AK
    Chaudhuri, BB
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 237 - 240
  • [6] An evaluation of automatic phone segmentation for concatenative speech synthesis
    Kawai, H
    Toda, T
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 677 - 680
  • [7] Automatic generation of speech synthesis units based on closed loop training
    Kagoshima, T
    Akamine, M
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 963 - 966
  • [8] Automatic generation of synthesis units for trainable text-to-speech systems
    Hon, H
    Acero, A
    Huang, X
    Liu, J
    Plumpe, M
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 293 - 296
  • [9] Prosodic Processing for the Automatic Synthesis of Emotional Russian Speech
    Kaliyev, Arman
    Matveev, Yuri N.
    Lyakso, Elena E.
    Rybin, Sergey V.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE QUALITY MANAGEMENT, TRANSPORT AND INFORMATION SECURITY, INFORMATION TECHNOLOGIES (IT&QM&IS), 2018, : 653 - 655
  • [10] TREE-BASED APPROACHES TO AUTOMATIC-GENERATION OF SPEECH SYNTHESIS RULES FOR PROSODIC PARAMETERS
    YAMASHITA, Y
    TANAKA, M
    AMAKO, Y
    NOMURA, Y
    OHTA, Y
    KITOH, A
    KAKUSHO, O
    MIZOGUCHI, R
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (11) : 1934 - 1941