Composite Wavelet Model for Stability-Oriented Speech Synthesis from Cepstral Features

被引:0
|
作者
Koguchi, Junya [1 ]
Sagayama, Shigeki [1 ]
机构
[1] Meiji Univ, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper discusses a stability-oriented vocoder based on Gabor wavelet approximation of the source signal for statistical speech synthesis. In conventional vocoders with recursive filters, the filter gain characteristics often cause degradations in the sound quality due to unstable behavior of recursive filters affected by sharp resonances driven by a particular overtone in the excitation signal. To cope with this problem, we have proposed Composite Wavelet Model (CWM) to avoid filter-caused problems and have made several improvements as a vocoder. Based on non-recursive filters, it enables synthesizing stable speech which is robust to changes in F-0 parameter. In this paper, we further discuss the optimal number of mixture components to improve the synthetic speech quality to determine them through subjective experimental evaluations and report them on the result of incorporating in an HMM-based speech synthesis system. Objective experimental evaluations confirmed the improved stability in the amplitude of the synthetic speech.
引用
收藏
页码:1697 / 1701
页数:5
相关论文
共 50 条
  • [21] Biometrics from heart sounds: Evaluation of a new approach based on wavelet packet cepstral features using HSCT-11 database
    Abo-Zahhad, M.
    Ahmed, Sabah M.
    Abbas, Sherif N.
    COMPUTERS & ELECTRICAL ENGINEERING, 2016, 53 : 346 - 358
  • [22] Tree-based Context Clustering Using Speech Recognition Features for Acoustic Model Training of Speech Synthesis
    Chanjaradwichai, Supadaech
    Suchato, Atiwong
    Punyabukkana, Proadpran
    2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
  • [23] VISUAL SPEECH SYNTHESIS FROM 3D MESH SEQUENCES DRIVEN BY COMBINED SPEECH FEATURES
    Kuhnke, Felix
    Ostermann, Joern
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1075 - 1080
  • [24] Innovative wavelet based speech model using optimal mother wavelet generated from pitch synchronous LPC trajectory
    Apte S.D.
    International Journal of Speech Technology, 2007, 10 (1) : 57 - 62
  • [25] Study of prosody model on Chinese speech synthesis based on the classification of syllabic prosody features
    Tao, Jianhua
    Cai, Lianhong
    Shengxue Xuebao/Acta Acustica, 2003, 28 (05): : 395 - 402
  • [26] Speech Synthesis from Brain Signals Based on Generative Model
    Lee, Young-Eun
    Lee, Seo-Hyun
    Kim, Soowon
    Kim, Sang-Ho
    Lee, Jung-Sun
    Lee, Seong-Whan
    2023 11TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI, 2023,
  • [27] Detection of the common cold from speech signals using transformer model and spectral features
    Warule, Pankaj
    Chandratre, Snigdha
    Mishra, Siba Prasad
    Deb, Suman
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
  • [28] Robust Features for Emotion Recognition from Speech by Using Gaussian Mixture Model Classification
    Navyasri, M.
    RajeswarRao, R.
    DaveeduRaju, A.
    Ramakrishnamurthy, M.
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 437 - 444
  • [29] Optimal stability-oriented protection coordination of smart grid's directional overcurrent relays based on optimized tripping characteristics in double-inverse model using high-set relay
    Narimani, Ali
    Hashemi-Dezaki, Hamed
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2021, 133
  • [30] Speech analysis and recognition using interval statistics generated from a composite auditory model
    Univ of Waterloo, Waterloo, Canada
    IEEE Trans Speech Audio Process, 1 (90-94):