Composite Wavelet Model for Stability-Oriented Speech Synthesis from Cepstral Features

被引：0

作者：

Koguchi, Junya ^{[1
]}

Sagayama, Shigeki ^{[1
]}

机构：

[1] Meiji Univ, Tokyo, Japan

来源：

2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2018年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper discusses a stability-oriented vocoder based on Gabor wavelet approximation of the source signal for statistical speech synthesis. In conventional vocoders with recursive filters, the filter gain characteristics often cause degradations in the sound quality due to unstable behavior of recursive filters affected by sharp resonances driven by a particular overtone in the excitation signal. To cope with this problem, we have proposed Composite Wavelet Model (CWM) to avoid filter-caused problems and have made several improvements as a vocoder. Based on non-recursive filters, it enables synthesizing stable speech which is robust to changes in F-0 parameter. In this paper, we further discuss the optimal number of mixture components to improve the synthetic speech quality to determine them through subjective experimental evaluations and report them on the result of incorporating in an HMM-based speech synthesis system. Objective experimental evaluations confirmed the improved stability in the amplitude of the synthetic speech.

引用

页码：1697 / 1701

页数：5

共 50 条

[21] Biometrics from heart sounds: Evaluation of a new approach based on wavelet packet cepstral features using HSCT-11 database
Abo-Zahhad, M.
Ahmed, Sabah M.
Abbas, Sherif N.
COMPUTERS & ELECTRICAL ENGINEERING, 2016, 53 : 346 - 358
[22] Tree-based Context Clustering Using Speech Recognition Features for Acoustic Model Training of Speech Synthesis
Chanjaradwichai, Supadaech
Suchato, Atiwong
Punyabukkana, Proadpran
2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
[23] VISUAL SPEECH SYNTHESIS FROM 3D MESH SEQUENCES DRIVEN BY COMBINED SPEECH FEATURES
Kuhnke, Felix
Ostermann, Joern
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1075 - 1080
[24] Innovative wavelet based speech model using optimal mother wavelet generated from pitch synchronous LPC trajectory
Apte S.D.
International Journal of Speech Technology, 2007, 10 (1) : 57 - 62
[25] Study of prosody model on Chinese speech synthesis based on the classification of syllabic prosody features
Tao, Jianhua
Cai, Lianhong
Shengxue Xuebao/Acta Acustica, 2003, 28 (05): : 395 - 402
[26] Speech Synthesis from Brain Signals Based on Generative Model
Lee, Young-Eun
Lee, Seo-Hyun
Kim, Soowon
Kim, Sang-Ho
Lee, Jung-Sun
Lee, Seong-Whan
2023 11TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI, 2023,
[27] Detection of the common cold from speech signals using transformer model and spectral features
Warule, Pankaj
Chandratre, Snigdha
Mishra, Siba Prasad
Deb, Suman
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
[28] Robust Features for Emotion Recognition from Speech by Using Gaussian Mixture Model Classification
Navyasri, M.
RajeswarRao, R.
DaveeduRaju, A.
Ramakrishnamurthy, M.
INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 437 - 444
[29] Optimal stability-oriented protection coordination of smart grid's directional overcurrent relays based on optimized tripping characteristics in double-inverse model using high-set relay
Narimani, Ali
Hashemi-Dezaki, Hamed
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2021, 133
[30] Speech analysis and recognition using interval statistics generated from a composite auditory model
Univ of Waterloo, Waterloo, Canada
IEEE Trans Speech Audio Process, 1 (90-94):

← 1 2 3 4 5 →