STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS

被引:0
|
作者
Zen, Heiga [1 ]
Senior, Andrew [1 ]
Schuster, Mike [1 ]
机构
[1] Google, Washington, DC USA
关键词
Statistical parametric speech synthesis; Hidden Markov model; Deep neural network;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.
引用
收藏
页码:7962 / 7966
页数:5
相关论文
共 50 条
  • [1] VOICE SOURCE MODELLING USING DEEP NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Raitio, Tuomo
    Lu, Heng
    Kane, John
    Suni, Antti
    Vainio, Martti
    King, Simon
    Alku, Paavo
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2290 - 2294
  • [2] Deep Elman recurrent neural networks for statistical parametric speech synthesis
    Achanta, Sivanand
    Gangashetty, Suryakanth V.
    SPEECH COMMUNICATION, 2017, 93 : 31 - 42
  • [3] THE EFFECT OF NEURAL NETWORKS IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4455 - 4459
  • [4] Multiple Feed-forward Deep Neural Networks for Statistical Parametric Speech Synthesis
    Takaki, Shinji
    Kim, SangJin
    Yamagishi, Junichi
    Kim, JongJin
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2242 - 2246
  • [5] DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Tokuda, Keiichi
    Zen, Heiga
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4215 - 4219
  • [6] Statistical Parametric Speech Synthesis Using Deep Gaussian Processes
    Koriyama, Tomoki
    Kobayashi, Takao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (05) : 948 - 959
  • [7] Spanish Statistical Parametric Speech Synthesis using a Neural Vocoder
    Bonafonte, Antonio
    Pascual, Santiago
    Dorca, Georgina
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1998 - 2001
  • [8] DEEP MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Zen, Heiga
    Senior, Andrew
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis
    Ling, Zhen-Hua
    Deng, Li
    Yu, Dong
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2129 - 2139
  • [10] On The Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
    Zheng, Yibin
    Tao, Jianhua
    Wen, Zhengqi
    Fu, Ruibo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 922 - 926