DEEP MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS

被引:0
|
作者
Zen, Heiga [1 ]
Senior, Andrew [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
Statistical parametric speech synthesis; hidden Markov models; deep neural networks; mixture density networks; SYNTHESIS SYSTEM; HMM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this paper investigates the use of a mixture density output layer. It can estimate full probability density functions over real-valued output features conditioned on the corresponding input features. Experimental results in objective and subjective evaluations show that the use of the mixture density output layer improves the prediction accuracy of acoustic features and the naturalness of the synthesized speech.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] GATING RECURRENT MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Wang, Wenfu
    Xu, Shuang
    Xu, Bo
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5520 - 5524
  • [2] A Kullback-Leibler Divergence Based Recurrent Mixture Density Network for Acoustic Modeling in Emotional Statistical Parametric Speech Synthesis
    An, Xiaochun
    Zhang, Yuchao
    Liu, Bing
    Xue, Liumeng
    Xie, Lei
    PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 1 - 6
  • [3] Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
    Fu, Ruibo
    Tao, Jianhua
    Zheng, Yibin
    Wen, Zhengqi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 907 - 911
  • [4] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7962 - 7966
  • [5] DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Tokuda, Keiichi
    Zen, Heiga
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4215 - 4219
  • [6] Deep Elman recurrent neural networks for statistical parametric speech synthesis
    Achanta, Sivanand
    Gangashetty, Suryakanth V.
    SPEECH COMMUNICATION, 2017, 93 : 31 - 42
  • [7] Deep Learning for Acoustic Modeling in Parametric Speech Generation
    Ling, Zhen-Hua
    Kang, Shi-Yin
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    Qian, Xiao-Jun
    Meng, Helen
    Deng, Li
    IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) : 35 - 52
  • [8] Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis
    Ling, Zhen-Hua
    Deng, Li
    Yu, Dong
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2129 - 2139
  • [9] VOICE SOURCE MODELLING USING DEEP NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Raitio, Tuomo
    Lu, Heng
    Kane, John
    Suni, Antti
    Vainio, Martti
    King, Simon
    Alku, Paavo
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2290 - 2294
  • [10] Multiple Feed-forward Deep Neural Networks for Statistical Parametric Speech Synthesis
    Takaki, Shinji
    Kim, SangJin
    Yamagishi, Junichi
    Kim, JongJin
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2242 - 2246