A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System

被引：0

作者：

Kim, Jin-Seob ^{[1
]}

Joo, Young-Sun ^{[1
]}

Kang, Hong-Goo ^{[1
]}

Jang, Inseon ^{[2
]}

Ahn, ChungHyun ^{[2
]}

Seo, Jeongil ^{[2
]}

机构：

[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea

[2] Elect & Telecommun Res Inst, Realist Broadcasting Media Res Dept, Daejeon, South Korea

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP) | 2016年

关键词：

Deep neural newtork (DNN); statistical parametric speech synthesis (SPSS); pitch-synchronous; glottal closure instants (GCIs); DEEP NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.

引用

页码：408 / 411

页数：4

共 50 条

[31] A DCT-Based Speech Enhancement System With Pitch Synchronous Analysis
Ding, Huijun
Soon, Ing Yann
Yeo, Chai Kiat
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2614 - 2623
[32] A pitch-synchronous peak-amplitude based feature extraction method for noise robust asr
Ghulam, Muhammad
Horikawa, Junsei
Nitta, Tsuneo
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 505 - 508
[33] An Investigation of Implementation and Performance Analysis of DNN Based Speech Synthesis System
Chen, Zhehuai
Yu, Kai
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 577 - 582
[34] A pitch-synchronous extension of fractal additive synthesis via time-varying cosine modulated filter banks
Polotti, Pietro
IEEE SIGNAL PROCESSING LETTERS, 2008, 15 : 433 - 436
[35] Improving the 2.4 kb/s military standard MELP (MS-MELP) coder using pitch-synchronous analysis and synthesis techniques
Ertan, AE
Barnwell, TP
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 761 - 764
[36] Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-formant stimuli
Kortekaas, RWL
Kohlrausch, A
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (04): : 2202 - 2213
[37] PITCH SYNCHRONOUS SPECTRAL-ANALYSIS SCHEME FOR VOICED SPEECH
MEDAN, Y
YAIR, E
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (09): : 1321 - 1328
[38] Pseudo pitch synchronous analysis of speech with applications to speaker recognition
Zilca, RD
Kingsbury, B
Navrátil, J
Ramaswamy, GN
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 467 - 478
[39] Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System
Reddy, M. Kiran
Rao, K. Sreenivasa
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1133 - 1137
[40] Pitch alteration technique in speech synthesis system
Jung, JS
Kim, JJ
Bae, MJ
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2001, 47 (01) : 163 - 167

← 1 2 3 4 5 →