Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques

被引:2
|
作者
Huang, Yi-Chin [1 ]
Wu, Chung-Hsien [1 ]
Weng, Si-Ting [2 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Dept Med Informat, Tainan 701, Taiwan
关键词
Fujisaki model; hierarchical prosodic pattern; Mandarin prosodic structure; natural speech synthesis; SPEECH SYNTHESIS; HMM; ALGORITHMS; CONTOURS;
D O I
10.1109/TASLP.2016.2588727
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Prosody plays a vital role for conveying both communicative meanings and specific speaking styles in speech communication. In recent years, Hidden Markov Model (HMM)-based synthesis system (HTS) has been developed in triumph, which can synthesize stable and smooth speech. However, the prosody of the synthesized speech suffers from the over-smoothing problem. Thus, a better prosodic model is required to improve the natural variability of the synthesized speech. This study exploits a hybrid method to alleviate this problem by combining the statistical and the template-based unit selection methods. First, a two-level clustering approach is proposed to obtain representative prosodic patterns (denoted by codewords) of the hierarchical prosodic structure modeled by a modified Fujisaki model. The prosodic codewords are then used to represent the prosody of each sentence in the parallel corpus consisting of the real speech corpus and the synthesized counterpart obtained from the HTS. The synthesized speech utterance is then used as the query for retrieving the prosodic codewords of the utterances in the synthesized corpus. The retrieved synthesized prosodic codewords are mapped to the prosodic codewords of the real speech based on linear mapping rules obtained from the parallel corpus. The prosodic codeword language models for prosodic word and prosodic phrase are employed respectively to choose the optimal codeword sequence of the real speech. Finally, the most likely sequence of prosodic codewords can be obtained based on the NURBS-based continuity measure for synthesizing speech with natural prosody. The experimental results of subjective and objective tests demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of the HMM-based method.
引用
收藏
页码:1897 / 1907
页数:11
相关论文
共 50 条
  • [1] An Investigation on Linguistic Features for Mandarin Prosody Generation
    Hung, Yu-Ping
    Yeh, Han-Yun
    Liao, I-Bin
    Pan, Chen-Ming
    Chiang, Chen-Yu
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [2] Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation
    Chiang, Chen-Yu
    Hung, Yu-Ping
    Liou, Guan-Ting
    Wang, Yih-Ru
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [3] Improving Mandarin Prosody Boundary Detection by Using Phonetic Information and Deep LSTM Model
    Lin, Ju
    Ji, Zhuanzhuan
    Dong, Wenwei
    Xie, Yanlu
    Zhang, Jinsong
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 504 - 508
  • [4] Punctuation-generation-inspired linguistic features for Mandarin prosody generation
    Chen-Yu Chiang
    Yu-Ping Hung
    Han-Yun Yeh
    I-Bin Liao
    Chen-Ming Pan
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [5] Punctuation-generation-inspired linguistic features for Mandarin prosody generation
    Chiang, Chen-Yu
    Hung, Yu-Ping
    Yeh, Han-Yun
    Liao, I-Bin
    Pan, Chen-Ming
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)
  • [6] Using prosody to improve Mandarin automatic speech recognition
    Ni, Chong-Jia
    Liu, Wen-Ju
    Xu, Bo
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2698 - 2701
  • [7] Advanced Unsupervised Joint Prosody Labeling and Modeling for Mandarin Speech and Its Application to Prosody Generation for TTS
    Chiang, Chen-Yu
    Chen, Sin-Horng
    Wang, Yih-Ru
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 500 - 503
  • [8] High-Quality Prosody Generation in Mandarin Text-to-Speech System
    Guo, Qing
    Zhang, Jie
    Katae, Nobuyuki
    Yu, Hao
    [J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2010, 46 (01): : 40 - 46
  • [9] High-quality prosody generation in Mandarin text-to-speech system
    Guo, Qing
    Zhang, Jie
    Katae, Nobuyuki
    Yu, Hao
    [J]. Fujitsu Scientific and Technical Journal, 2010, 46 (01): : 40 - 46
  • [10] Alternative Smoothing Techniques to Mitigate EUV Substrate Defectivity
    Teki, R.
    Kadaksham, A. John
    House, M.
    Harris-Jones, J.
    Ma, A.
    Babu, S. V.
    Hariprasad, A.
    Dumas, P.
    Jenkins, R.
    Provine, J.
    Richmann, A.
    Stowers, J.
    Meyers, S.
    Dietze, U.
    Kusumoto, T.
    Yatsui, T.
    Ohtsu, M.
    Goodwin, F.
    [J]. EXTREME ULTRAVIOLET (EUV) LITHOGRAPHY III, 2012, 8322