Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques

被引:2
|
作者
Huang, Yi-Chin [1 ]
Wu, Chung-Hsien [1 ]
Weng, Si-Ting [2 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Dept Med Informat, Tainan 701, Taiwan
关键词
Fujisaki model; hierarchical prosodic pattern; Mandarin prosodic structure; natural speech synthesis; SPEECH SYNTHESIS; HMM; ALGORITHMS; CONTOURS;
D O I
10.1109/TASLP.2016.2588727
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Prosody plays a vital role for conveying both communicative meanings and specific speaking styles in speech communication. In recent years, Hidden Markov Model (HMM)-based synthesis system (HTS) has been developed in triumph, which can synthesize stable and smooth speech. However, the prosody of the synthesized speech suffers from the over-smoothing problem. Thus, a better prosodic model is required to improve the natural variability of the synthesized speech. This study exploits a hybrid method to alleviate this problem by combining the statistical and the template-based unit selection methods. First, a two-level clustering approach is proposed to obtain representative prosodic patterns (denoted by codewords) of the hierarchical prosodic structure modeled by a modified Fujisaki model. The prosodic codewords are then used to represent the prosody of each sentence in the parallel corpus consisting of the real speech corpus and the synthesized counterpart obtained from the HTS. The synthesized speech utterance is then used as the query for retrieving the prosodic codewords of the utterances in the synthesized corpus. The retrieved synthesized prosodic codewords are mapped to the prosodic codewords of the real speech based on linear mapping rules obtained from the parallel corpus. The prosodic codeword language models for prosodic word and prosodic phrase are employed respectively to choose the optimal codeword sequence of the real speech. Finally, the most likely sequence of prosodic codewords can be obtained based on the NURBS-based continuity measure for synthesizing speech with natural prosody. The experimental results of subjective and objective tests demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of the HMM-based method.
引用
收藏
页码:1897 / 1907
页数:11
相关论文
共 50 条
  • [21] Improving human scoring of prosody using parametric speech synthesis
    Prafianto, Hafiyan
    Nose, Takashi
    Chiba, Yuya
    Ito, Akinori
    [J]. SPEECH COMMUNICATION, 2019, 111 (14-21) : 14 - 21
  • [22] Hypothesis testing for gamma response generation using alternative signal analysis techniques
    Karakas, S.
    Arikan, O.
    [J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2006, 61 (03) : 324 - 324
  • [23] Hexahedral mesh generation using isoparametric smoothing
    Li, TS
    [J]. PROCEEDINGS OF DYNAMIC SYSTEMS AND APPLICATIONS, VOL 4, 2004, : 563 - 567
  • [24] Improving similarity measures of histograms using smoothing projections
    Kamarainen, JK
    Kyrki, V
    Ilonen, J
    Kälviäinen, H
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (12) : 2009 - 2019
  • [25] Emotional Speech Generation by Using Statistic Prosody Conversion Methods
    Tao, Jianhua
    Li, Aijun
    [J]. AFFECTIVE INFORMATION PROCESSING, 2009, : 127 - +
  • [26] Overview of current text-to-speech techniques: part II - prosody and speech generation
    Edgington, M.
    Lowry, A.
    Jackson, P.
    Breen, A.P.
    Minnis, S.
    [J]. British Telecom technology journal, 1996, 14 (01): : 84 - 99
  • [27] Overview of current text-to-speech techniques .2. Prosody and speech generation
    Edgington, M
    Lowry, A
    Jackson, P
    Breen, AP
    Minnis, S
    [J]. BT TECHNOLOGY JOURNAL, 1996, 14 (01): : 84 - 99
  • [28] Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation
    D. Govind
    Tinu T. Joy
    [J]. Circuits, Systems, and Signal Processing, 2016, 35 : 2518 - 2543
  • [29] Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation
    Govind, D.
    Joy, Tinu T.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2016, 35 (07) : 2518 - 2543
  • [30] SMOOTHING TECHNIQUES FOR IMPROVING THE REPEATABILITY OF THE AUTOMATED DIAGNOSIS OF INFERIOR MYOCARDIAL-INFARCTION
    MCLAUGHLIN, SC
    AITCHISON, TC
    YANG, TF
    MACFARLANE, PW
    [J]. JOURNAL OF ELECTROCARDIOLOGY, 1994, 27 : 182 - 187