Excitation modelling using epoch features for statistical parametric speech synthesis

被引:7
|
作者
Reddy, M. Kiran [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur, W Bengal, India
来源
关键词
Speech synthesis; Hidden markov model; Deep neural networks; Epoch parameters; Source features; Excitation modelling; SYNTHESIS SYSTEM; EXTRACTION; CODEBOOK; BAND;
D O I
10.1016/j.csl.2019.101029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel excitation modelling method is proposed for improving the naturalness of statistical parametric speech synthesis (SPSS). In the proposed approach, the excitation or residual signal is parameterized by using features extracted from the epochs. The epoch parameters used in this work are epoch strength and sharpness. These features are modeled in the statistical framework along with other parameters. During synthesis, the excitation signal is constructed by imposing the generated epoch parameters on the natural instances of excitation signal. The effectiveness of the proposed method is evaluated in the framework of hidden Markov model (HMM)-based and deep neural network (DNN)-based SPSS. Evaluation results have shown that the SPSS systems developed using the proposed excitation model are capable of synthesizing more natural sounding speech compared to the ones based on two state-of-the-art excitation modelling approaches. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review
    Adiga, Nagaraj
    Prasanna, S. R. M.
    [J]. IETE TECHNICAL REVIEW, 2019, 36 (02) : 130 - 149
  • [2] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
    Zangar, Imene
    Mnasri, Zied
    Colotte, Vincent
    Jouvet, Denis
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8331 - 8353
  • [3] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
    Imene Zangar
    Zied Mnasri
    Vincent Colotte
    Denis Jouvet
    [J]. Multimedia Tools and Applications, 2021, 80 : 8331 - 8353
  • [4] VOICE SOURCE MODELLING USING DEEP NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Raitio, Tuomo
    Lu, Heng
    Kane, John
    Suni, Antti
    Vainio, Martti
    King, Simon
    Alku, Paavo
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2290 - 2294
  • [5] Improved voicing decision using glottal activity features for statistical parametric speech synthesis
    Adiga, Nagaraj
    Khonglah, Banriskhem K.
    Prasanna, S. R. Mahadeva
    [J]. DIGITAL SIGNAL PROCESSING, 2017, 71 : 131 - 143
  • [6] On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis
    Maia, Ranniery
    Akamine, Masami
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (05): : 1209 - 1232
  • [7] Statistical parametric speech synthesis
    Black, Alan W.
    Zen, Heiga
    Tokuda, Keiichi
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
  • [8] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    [J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [9] Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis
    Espic, Felipe
    Valentini-Botinhao, Cassia
    King, Simon
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1383 - 1387
  • [10] Statistical parametric speech synthesis with a novel codebook-based excitation model
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2014, 8 (04): : 289 - 299