STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS

被引:0
|
作者
Zen, Heiga [1 ]
Senior, Andrew [1 ]
Schuster, Mike [1 ]
机构
[1] Google, Washington, DC USA
关键词
Statistical parametric speech synthesis; Hidden Markov model; Deep neural network;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.
引用
收藏
页码:7962 / 7966
页数:5
相关论文
共 50 条
  • [21] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [22] Speech watermarking using Deep Neural Networks
    Pavlovic, Kosta
    Kovacevic, Slavko
    Durovic, Igor
    2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
  • [23] Investigating very deep highway networks for parametric speech synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    SPEECH COMMUNICATION, 2018, 96 : 1 - 9
  • [24] SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Ai, Yang
    Wu, Hong-Chuan
    Ling, Zhen-Hua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5659 - 5663
  • [25] An Investigation of Recurrent Neural Network Architectures for Statistical Parametric Speech Synthesis
    Achanta, Sivanand
    Godambe, Tejas
    Gangashetty, Suryakanth V.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 859 - 863
  • [26] Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
    Lee, Seo-Hyun
    Lee, Young-Eun
    Kim, Soowon
    Ko, Byung-Kwan
    Kim, Jun-Young
    2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
  • [27] MULTI-CLASS LEARNING ALGORITHM FOR DEEP NEURAL NETWORK-BASED STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Song, Eunwoo
    Kang, Hong-Goo
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1951 - 1955
  • [28] Statistical parametric speech synthesis for Arabic language using ANN
    Ilyes, Rebai
    BenAyed, Yassine
    2014 1ST INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP 2014), 2014, : 452 - 457
  • [29] Statistical parametric speech synthesis using a hidden trajectory model
    Cai, Ming-Qi
    Ling, Zhen-Hua
    Dai, Li-Rong
    SPEECH COMMUNICATION, 2015, 72 : 149 - 159
  • [30] Statistical Parametric Speech Synthesis Using Generalized Distillation Framework
    Liu, Zheng-Chen
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (05) : 695 - 699