Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System

被引：5

作者：

Narendra, N. P. ^{[1
]}

Rao, K. Sreenivasa ^{[1
]}

机构：

[1] Indian Inst Technol Kharagpur, Sch Informat Technol, Kharagpur 721302, W Bengal, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2017年 / 36卷 / 09期

关键词：

HMM-based speech synthesis; Deterministic plus noise model; Excitation model; Residual frame; PCA; RESIDUAL CODEBOOK; CODER;

D O I：

10.1007/s00034-016-0476-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a new approach of parameterizing the excitation signal for improving the quality of HMM-based speech synthesis system. The proposed method tries to model the excitation or residual signal by segregating the regions of the residual signal based on their perceptual importance. Initially, a study on the characteristics of the residual signal around glottal closure instant (GCI) is performed using principal component analysis (PCA). Based on the present study, and from the previous literature (Adiga and Prasanna in Proceedings of Interspeech, pp 1677-1681, 2013; Cabral in Proceedings of Interspeech, pp 1082-1086, 2013), it is concluded that the segment of the residual signal around GCI which carries perceptually important information is considered as the deterministic component and the remaining part of the residual signal is considered as the noise component. The deterministic component is compactly represented using PCA coefficients (with about 95% accuracy), and the noise component is parameterized in terms of spectral and amplitude envelopes. The proposed excitation modeling approach is incorporated in the HMM-based speech synthesis system. Subjective evaluation results show a significant improvement of quality for both female and male speakers' speech synthesized by the proposed method, compared to three existing excitation modeling methods. Accurate parameterization of the segment of the residual signal around GCI resulted in the improvement of the quality of the synthesized speech. Synthesized speech samples of the proposed and existing source models are made available online at http://www.sit.iitkgp.ernet.in/similar to ksrao/parametric-hts/pcd-hts.html.

引用

页码：3650 / 3673

页数：24

共 50 条

[1] Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System
N. P. Narendra
K. Sreenivasa Rao
[J]. Circuits, Systems, and Signal Processing, 2017, 36 : 3650 - 3673
[2] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
Silen, Hanna
Helander, Elina
Nurminen, Jani
Gabbouj, Moncef
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +
[3] Generation of creaky voice for improving the quality of HMM-based speech synthesis
Narendra, N. P.
Rao, K. Sreenivasa
[J]. COMPUTER SPEECH AND LANGUAGE, 2017, 42 : 38 - 58
[4] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
[5] Inverse filter based excitation model for HMM-based speech synthesis system
Reddy, Mittapalle Kiran
Rao, Krothapalli Sreenivasa
[J]. IET SIGNAL PROCESSING, 2018, 12 (04) : 544 - 548
[6] An HMM-based Vietnamese Speech Synthesis System
Vu, Thang Tat
Luong, Mai Chi
Nakamura, Satoshi
[J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
[7] An HMM-based Cantonese Speech Synthesis System
Wang, Xin
Wu, Zhiyong
[J]. 2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
[8] Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis
Sung, June Sig
Hong, Doo Hwa
Koo, Hyun Woo
Kim, Nam Soo
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (02): : 379 - 382
[9] Two-band excitation for HMM-based speech synthesis
Kim, Sang-Jin
Hahn, Minsoo
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 378 - 381
[10] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
Shiga, Yoshinori
Toda, Tomoki
Sakai, Shinsuke
Kawai, Hisashi
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812

← 1 2 3 4 5 →