STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS

被引：0

作者：

Zen, Heiga ^{[1
]}

Senior, Andrew ^{[1
]}

Schuster, Mike ^{[1
]}

机构：

[1] Google, Washington, DC USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

Statistical parametric speech synthesis; Hidden Markov model; Deep neural network;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

引用

页码：7962 / 7966

页数：5

共 50 条

[21] Statistical parametric speech synthesis
Zen, Heiga
Tokuda, Keiichi
Black, Alan W.
SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
[22] Speech watermarking using Deep Neural Networks
Pavlovic, Kosta
Kovacevic, Slavko
Durovic, Igor
2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
[23] Investigating very deep highway networks for parametric speech synthesis
Wang, Xin
Takaki, Shinji
Yamagishi, Junichi
SPEECH COMMUNICATION, 2018, 96 : 1 - 9
[24] SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Ai, Yang
Wu, Hong-Chuan
Ling, Zhen-Hua
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5659 - 5663
[25] An Investigation of Recurrent Neural Network Architectures for Statistical Parametric Speech Synthesis
Achanta, Sivanand
Godambe, Tejas
Gangashetty, Suryakanth V.
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 859 - 863
[26] Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
Lee, Seo-Hyun
Lee, Young-Eun
Kim, Soowon
Ko, Byung-Kwan
Kim, Jun-Young
2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
[27] MULTI-CLASS LEARNING ALGORITHM FOR DEEP NEURAL NETWORK-BASED STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Song, Eunwoo
Kang, Hong-Goo
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1951 - 1955
[28] Statistical parametric speech synthesis for Arabic language using ANN
Ilyes, Rebai
BenAyed, Yassine
2014 1ST INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP 2014), 2014, : 452 - 457
[29] Statistical parametric speech synthesis using a hidden trajectory model
Cai, Ming-Qi
Ling, Zhen-Hua
Dai, Li-Rong
SPEECH COMMUNICATION, 2015, 72 : 149 - 159
[30] Statistical Parametric Speech Synthesis Using Generalized Distillation Framework
Liu, Zheng-Chen
Ling, Zhen-Hua
Dai, Li-Rong
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (05) : 695 - 699

← 1 2 3 4 5 →