Composite Wavelet Model for Stability-Oriented Speech Synthesis from Cepstral Features

被引：0

作者：

Koguchi, Junya ^{[1
]}

Sagayama, Shigeki ^{[1
]}

机构：

[1] Meiji Univ, Tokyo, Japan

来源：

2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2018年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper discusses a stability-oriented vocoder based on Gabor wavelet approximation of the source signal for statistical speech synthesis. In conventional vocoders with recursive filters, the filter gain characteristics often cause degradations in the sound quality due to unstable behavior of recursive filters affected by sharp resonances driven by a particular overtone in the excitation signal. To cope with this problem, we have proposed Composite Wavelet Model (CWM) to avoid filter-caused problems and have made several improvements as a vocoder. Based on non-recursive filters, it enables synthesizing stable speech which is robust to changes in F-0 parameter. In this paper, we further discuss the optimal number of mixture components to improve the synthetic speech quality to determine them through subjective experimental evaluations and report them on the result of incorporating in an HMM-based speech synthesis system. Objective experimental evaluations confirmed the improved stability in the amplitude of the synthetic speech.

引用

页码：1697 / 1701

页数：5

共 50 条

[31] Speech analysis and recognition using interval statistics generated from a composite auditory model
Sheikhzadeh, H
Deng, L
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 90 - 94
[32] Shifting Complexity from Text to Data Model Adding Machine-Oriented Features to a Human-Oriented Terminology Resource
Suchowolec, Karolina
Lang, Christian
Schneider, Roman
Schwinn, Horst
LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017, 2017, 10318 : 203 - 212
[33] SYNTHESIS OF SPEECH FROM A DYNAMIC MODEL OF VOCAL CORDS AND VOCAL-TRACT
FLANAGAN, JL
ISHIZAKA, K
SHIPLEY, KL
BELL SYSTEM TECHNICAL JOURNAL, 1975, 54 (03): : 485 - 506
[34] Generating Intonation from a Mixed CART-HMM Model for Speech Synthesis
Boidin, Cedric
Boeffard, Olivier
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2130 - +
[35] A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients
Togneri, Roberto
Deng, Li
SPEECH COMMUNICATION, 2006, 48 (08) : 971 - 988
[36] Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi
Kaur, Navdeep
Singh, Parminder
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26101 - 26120
[37] Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi
Navdeep Kaur
Parminder Singh
Multimedia Tools and Applications, 2022, 81 : 26101 - 26120
[38] Investigation of damage mechanisms of composite materials: Multivariable analysis based on temporal and wavelet features extracted from acoustic emission signals
Marec, A.
Thomas, J. H.
El Guerjouma, R.
Berbaoui, R.
ULTRASONIC WAVE PROPAGATION IN NON HOMOGENEOUS MEDIA, 2009, 128 : 399 - 409
[39] Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis
Zhu, Xiaolian
Zhang, Yuchao
Yang, Shan
Xue, Liumeng
Xie, Lei
IEEE ACCESS, 2019, 7 : 65955 - 65964
[40] Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
Al-Radhi, Mohammed Salah
Csapo, Tamas Gabor
Zainko, Csaba
Nemeth, Geza
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1150 - 1154

← 1 2 3 4 5 →