Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System

被引:5
|
作者
Narendra, N. P. [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol Kharagpur, Sch Informat Technol, Kharagpur 721302, W Bengal, India
关键词
HMM-based speech synthesis; Deterministic plus noise model; Excitation model; Residual frame; PCA; RESIDUAL CODEBOOK; CODER;
D O I
10.1007/s00034-016-0476-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a new approach of parameterizing the excitation signal for improving the quality of HMM-based speech synthesis system. The proposed method tries to model the excitation or residual signal by segregating the regions of the residual signal based on their perceptual importance. Initially, a study on the characteristics of the residual signal around glottal closure instant (GCI) is performed using principal component analysis (PCA). Based on the present study, and from the previous literature (Adiga and Prasanna in Proceedings of Interspeech, pp 1677-1681, 2013; Cabral in Proceedings of Interspeech, pp 1082-1086, 2013), it is concluded that the segment of the residual signal around GCI which carries perceptually important information is considered as the deterministic component and the remaining part of the residual signal is considered as the noise component. The deterministic component is compactly represented using PCA coefficients (with about 95% accuracy), and the noise component is parameterized in terms of spectral and amplitude envelopes. The proposed excitation modeling approach is incorporated in the HMM-based speech synthesis system. Subjective evaluation results show a significant improvement of quality for both female and male speakers' speech synthesized by the proposed method, compared to three existing excitation modeling methods. Accurate parameterization of the segment of the residual signal around GCI resulted in the improvement of the quality of the synthesized speech. Synthesized speech samples of the proposed and existing source models are made available online at http://www.sit.iitkgp.ernet.in/similar to ksrao/parametric-hts/pcd-hts.html.
引用
收藏
页码:3650 / 3673
页数:24
相关论文
共 50 条
  • [1] Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System
    N. P. Narendra
    K. Sreenivasa Rao
    [J]. Circuits, Systems, and Signal Processing, 2017, 36 : 3650 - 3673
  • [2] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
    Silen, Hanna
    Helander, Elina
    Nurminen, Jani
    Gabbouj, Moncef
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +
  • [3] Generation of creaky voice for improving the quality of HMM-based speech synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 42 : 38 - 58
  • [4] A trainable excitation model for HMM-based speech synthesis
    Maia, R.
    Toda, T.
    Zen, H.
    Nankaku, Y.
    Tokuda, K.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
  • [5] Inverse filter based excitation model for HMM-based speech synthesis system
    Reddy, Mittapalle Kiran
    Rao, Krothapalli Sreenivasa
    [J]. IET SIGNAL PROCESSING, 2018, 12 (04) : 544 - 548
  • [6] An HMM-based Vietnamese Speech Synthesis System
    Vu, Thang Tat
    Luong, Mai Chi
    Nakamura, Satoshi
    [J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
  • [7] An HMM-based Cantonese Speech Synthesis System
    Wang, Xin
    Wu, Zhiyong
    [J]. 2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
  • [8] Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Koo, Hyun Woo
    Kim, Nam Soo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (02): : 379 - 382
  • [9] Two-band excitation for HMM-based speech synthesis
    Kim, Sang-Jin
    Hahn, Minsoo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 378 - 381
  • [10] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
    Shiga, Yoshinori
    Toda, Tomoki
    Sakai, Shinsuke
    Kawai, Hisashi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812