Statistical parametric speech synthesis with a novel codebook-based excitation model

被引：2

作者：

Csapo, Tamas Gabor ^{[1
]}

Nemeth, Geza ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary

来源：

INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS | 2014年 / 8卷 / 04期

关键词：

Text-to-speech synthesis; speech processing; excitation model; vocoding; parametric;

D O I：

10.3233/IDT-140197

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech synthesis is an important modality in Cognitive Infocommunications, which is the intersection of informatics and cognitive sciences. Statistical parametric methods have gained importance in speech synthesis recently. The speech signal is decomposed to parameters and later restored from them. The decomposition is implemented by speech coders. We apply a novel codebook-based speech coding method to model the excitation of speech. In the analysis stage the speech signal is analyzed frame-by-frame and a codebook of pitch synchronous excitations is built from the voiced parts. Timing, gain and harmonic-tonoise ratio parameters are extracted and fed into the machine learning stage of Hidden Markov-model based speech synthesis. During the synthesis stage the codebook is searched for a suitable element in each voiced frame and these are concatenated to create the excitation signal, from which the final synthesized speech is created. Our initial experiments show that the model fits well in the statistical parametric speech synthesis framework and in most cases it can synthesize speech in a better quality than the traditional pulse-noise excitation. (This paper is an extended version of [10].)

引用

页码：289 / 299

页数：11

共 50 条

[1] A novel codebook-based excitation model for use in speech synthesis
Csapo, Tamas Gabor
Nemeth, Geza
[J]. 3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012), 2012, : 661 - 665
[2] Modeling Irregular Voice in Statistical Parametric Speech Synthesis With Residual Codebook Based Excitation
Csapo, Tamas Gabor
Nemeth, Geza
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 209 - 220
[3] Improved Codebook-based Speech Enhancement based on MBE Model
Huang, Qizheng
Bao, Changchun
Wang, Xianyun
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3627 - 3631
[4] Codebook-based Bayesian speech enhancement
Srinivasan, S
Samuelsson, J
Kleijn, WB
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1077 - 1080
[5] Codebook-based Bayesian speech enhancement for nonstationary environments
Srinivasan, Sriram
Samuelsson, Jonas
Kleijn, W. Bastiaan
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 441 - 452
[6] Binaural Codebook-Based Speech Enhancement With Atomic Speech Presence Probability
Wood, Sean U. N.
Stahl, Johannes K. W.
Mowlaee, Pejman
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2150 - 2161
[7] Codebook-based Speech Enhancement with Bayesian LP Parameters Estimation
Wang, Qing
Bao, Chang-chun
[J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1245 - 1248
[8] Blind Bandwidth Extension for Codebook-based Bayesian Speech Enhancement
Li, Yaxing
Kim, Jonghyeon
Kang, Sangwon
[J]. 18TH IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE 2014), 2014,
[9] Speech Enhancement in Modulation Domain Using Codebook-based Speech and Noise Estimation
Mani, Vidhyasagar
Champagne, Benoit
Zhu, Wei-Ping
[J]. 2015 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2015, : 707 - 711
[10] Real-Time Codebook-based Speech Enhancement with GPUs
Prasanna, A. N. Sai
Gurumurthyt, Iver Chandrashekaran
Naidu, D. H. R.
Baruith, Pallav Kuniar
[J]. 2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 306 - 311

← 1 2 3 4 5 →