Composite Wavelet Model for Stability-Oriented Speech Synthesis from Cepstral Features

被引:0
|
作者
Koguchi, Junya [1 ]
Sagayama, Shigeki [1 ]
机构
[1] Meiji Univ, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper discusses a stability-oriented vocoder based on Gabor wavelet approximation of the source signal for statistical speech synthesis. In conventional vocoders with recursive filters, the filter gain characteristics often cause degradations in the sound quality due to unstable behavior of recursive filters affected by sharp resonances driven by a particular overtone in the excitation signal. To cope with this problem, we have proposed Composite Wavelet Model (CWM) to avoid filter-caused problems and have made several improvements as a vocoder. Based on non-recursive filters, it enables synthesizing stable speech which is robust to changes in F-0 parameter. In this paper, we further discuss the optimal number of mixture components to improve the synthetic speech quality to determine them through subjective experimental evaluations and report them on the result of incorporating in an HMM-based speech synthesis system. Objective experimental evaluations confirmed the improved stability in the amplitude of the synthetic speech.
引用
收藏
页码:1697 / 1701
页数:5
相关论文
共 50 条
  • [31] Speech analysis and recognition using interval statistics generated from a composite auditory model
    Sheikhzadeh, H
    Deng, L
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 90 - 94
  • [32] Shifting Complexity from Text to Data Model Adding Machine-Oriented Features to a Human-Oriented Terminology Resource
    Suchowolec, Karolina
    Lang, Christian
    Schneider, Roman
    Schwinn, Horst
    LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017, 2017, 10318 : 203 - 212
  • [33] SYNTHESIS OF SPEECH FROM A DYNAMIC MODEL OF VOCAL CORDS AND VOCAL-TRACT
    FLANAGAN, JL
    ISHIZAKA, K
    SHIPLEY, KL
    BELL SYSTEM TECHNICAL JOURNAL, 1975, 54 (03): : 485 - 506
  • [34] Generating Intonation from a Mixed CART-HMM Model for Speech Synthesis
    Boidin, Cedric
    Boeffard, Olivier
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2130 - +
  • [35] A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients
    Togneri, Roberto
    Deng, Li
    SPEECH COMMUNICATION, 2006, 48 (08) : 971 - 988
  • [36] Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi
    Kaur, Navdeep
    Singh, Parminder
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26101 - 26120
  • [37] Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi
    Navdeep Kaur
    Parminder Singh
    Multimedia Tools and Applications, 2022, 81 : 26101 - 26120
  • [38] Investigation of damage mechanisms of composite materials: Multivariable analysis based on temporal and wavelet features extracted from acoustic emission signals
    Marec, A.
    Thomas, J. H.
    El Guerjouma, R.
    Berbaoui, R.
    ULTRASONIC WAVE PROPAGATION IN NON HOMOGENEOUS MEDIA, 2009, 128 : 399 - 409
  • [39] Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis
    Zhu, Xiaolian
    Zhang, Yuchao
    Yang, Shan
    Xue, Liumeng
    Xie, Lei
    IEEE ACCESS, 2019, 7 : 65955 - 65964
  • [40] Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Zainko, Csaba
    Nemeth, Geza
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1150 - 1154