On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

被引:4
|
作者
Maia, Ranniery [1 ]
Akamine, Masami [2 ]
机构
[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge CB4 0GZ, England
[2] Toshiba Co Ltd, Corp Res & Dev Ctr, Saiwai Ku, Kawasaki, Kanagawa 2128582, Japan
来源
COMPUTER SPEECH AND LANGUAGE | 2014年 / 28卷 / 05期
关键词
Speech synthesis; Statistical parametric speech synthesis; Expressive speech synthesis; Speech parameterization; REPRESENTATION;
D O I
10.1016/j.csl.2013.10.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a study on the importance of short-term speech parameterizations for expressive statistical parametric synthesis. Assuming a source-filter model of speech production, the analysis is conducted over spectral parameters, here defined as features which represent a minimum-phase synthesis filter, and some excitation parameters, which are features used to construct a signal that is fed to the minimum-phase synthesis filter to generate speech. In the first part, different spectral and excitation parameters that are applicable to statistical parametric synthesis are tested to determine which ones are the most emotion dependent. The analysis is performed through two methods proposed to measure the relative emotion dependency of each feature: one based on K-means clustering, and another based on Gaussian mixture modeling for emotion identification. Two commonly used forms of parameters for the short-term speech spectral envelope, the Mel cepstrum and the Mel line spectrum pairs are utilized. As excitation parameters, the anti-causal cepstrum, the time-smoothed group delay, and band-aperiodicity coefficients are considered. According to the analysis, the line spectral pairs are the most emotion dependent parameters. Among the excitation features, the band-aperiodicity coefficients present the highest correlation with the speaker's emotion. The most emotion dependent parameters according to this analysis were selected to train an expressive statistical parametric synthesizer using a speaker and language factorization framework. Subjective test results indicate that the considered spectral parameters have a bigger impact on the synthesized speech emotion when compared with the excitation ones. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1209 / 1232
页数:24
相关论文
共 50 条
  • [1] Excitation modelling using epoch features for statistical parametric speech synthesis
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [2] Statistical parametric speech synthesis
    Black, Alan W.
    Zen, Heiga
    Tokuda, Keiichi
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
  • [3] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    [J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [4] MULTI-STREAM SPECTRAL REPRESENTATION FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Yanagisawa, Kayoko
    Maia, Ranniery
    Stylianou, Yannis
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5160 - 5164
  • [5] Statistical parametric speech synthesis with a novel codebook-based excitation model
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2014, 8 (04): : 289 - 299
  • [6] Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis
    Raitio, Tuomo
    Juvela, Lauri
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    [J]. SPEECH COMMUNICATION, 2016, 81 : 104 - 119
  • [7] DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis
    Hu, Ya-Jun
    Ling, Zhen-Hua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (03) : 321 - 325
  • [8] Statistical Parametric Speech Synthesis: A Review
    Aroon, Athira
    Dhonde, S. B.
    [J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [9] An introduction to statistical parametric speech synthesis
    Simon King
    [J]. Sadhana, 2011, 36 : 837 - 852
  • [10] An introduction to statistical parametric speech synthesis
    King, Simon
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 837 - 852