Composite Wavelet Model for Stability-Oriented Speech Synthesis from Cepstral Features

被引：0

作者：

Koguchi, Junya ^{[1
]}

Sagayama, Shigeki ^{[1
]}

机构：

[1] Meiji Univ, Tokyo, Japan

来源：

2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2018年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper discusses a stability-oriented vocoder based on Gabor wavelet approximation of the source signal for statistical speech synthesis. In conventional vocoders with recursive filters, the filter gain characteristics often cause degradations in the sound quality due to unstable behavior of recursive filters affected by sharp resonances driven by a particular overtone in the excitation signal. To cope with this problem, we have proposed Composite Wavelet Model (CWM) to avoid filter-caused problems and have made several improvements as a vocoder. Based on non-recursive filters, it enables synthesizing stable speech which is robust to changes in F-0 parameter. In this paper, we further discuss the optimal number of mixture components to improve the synthetic speech quality to determine them through subjective experimental evaluations and report them on the result of incorporating in an HMM-based speech synthesis system. Objective experimental evaluations confirmed the improved stability in the amplitude of the synthetic speech.

引用

页码：1697 / 1701

页数：5

共 50 条

[41] Screening for Generalized Anxiety Disorder From Acoustic and Linguistic Features of Impromptu Speech: Prediction Model Evaluation Study
Teferra, Bazen Gashaw
Borwein, Sophie
DeSouza, Danielle D.
Rose, Jonathan
JMIR FORMATIVE RESEARCH, 2022, 6 (10)
[42] Model-Based Synthesis of Visual Speech Movements from 3D Video
JamesD Edge
Adrian Hilton
Philip Jackson
EURASIP Journal on Audio, Speech, and Music Processing, 2009
[43] Model-Based Synthesis of Visual Speech Movements from 3D Video
Edge, James D.
Hilton, Adrian
Jackson, Philip
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
[44] A wavelet subband based LSTM model for 12-lead ECG synthesis from reduced lead set
Kapfo, Ato
Datta, Sumit
Dandapat, Samarendra
Bora, Prabin Kumar
BIOMEDICAL ENGINEERING LETTERS, 2024, 14 (06) : 1385 - 1395
[45] Synthesis of virtual monoenergetic images from kilovoltage peak images using wavelet loss enhanced CycleGAN for improving radiomics features reproducibility
Xu, Zilong
Li, Miaomiao
Li, Baosheng
Shu, Huazhong
QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (03) : 2370 - 2390
[46] Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition
Sudhakar, Prasad
Ghosh, Prasanta Kumar
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 169 - 173
[47] A STUDY ON THE INFLUENCE OF SPEECH COMMUNICATION SYNTHESIS MODEL ON COLLEGE STUDENTS' POSITIVE PSYCHOLOGY FROM THE PERSPECTIVE OF COGNITIVE PRAGMATICS
Wang, Jiabao
PSYCHIATRIA DANUBINA, 2022, 34 : S498 - S500
[48] Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora:: application to emotional speech synthesis
Hirose, K
Sato, K
Asano, Y
Minematsu, N
SPEECH COMMUNICATION, 2005, 46 (3-4) : 385 - 404
[49] HMM-BASED APPROACHES TO MODEL MULTICHANNEL INFORMATION IN SIGN LANGUAGE INSPIRED FROM ARTICULATORY FEATURES-BASED SPEECH PROCESSING
Tornay, Sandrine
Razavi, Marzieh
Camgoz, Necati Cihan
Bowden, Richard
Magimai-Doss, Mathew
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2817 - 2821
[50] Synthesis of poly(vinylidene chloride)-based composite latexes by emulsion polymerization from epoxy functional seeds for improved thermal stability
Garnier, Jerome
Dufils, Pierre-Emmanuel
Vinas, Jerome
Vanderveken, Yves
van Herk, Alex
Lacroix-Desmazes, Patrick
POLYMER DEGRADATION AND STABILITY, 2012, 97 (02) : 170 - 177

← 1 2 3 4 5 →