STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引：0

作者：

Tanaka, Kou ^{[1
]}

Kameoka, Hirokazu ^{[2
]}

Toda, Tomoki ^{[3
]}

Nakamura, Satoshi ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan

[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.

引用

页码：5665 / 5669

页数：5

共 50 条

[21] Modelling and synthesising F0 contours with the Discrete Cosine Transform
Teutenberg, Jonathan
Watson, Catherine
Riddle, Patricia
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3973 - +
[22] Effects of tone and focus on the formation and alignment of f0 contours
Xu, Y
JOURNAL OF PHONETICS, 1999, 27 (01) : 55 - 105
[23] TRANSFORMATION OF F0 CONTOURS FOR LEXICAL TONES IN CONCATENATIVE SPEECH SYNTHESIS OF TONAL LANGUAGES
Trung-Nghia Phung
Luong, Mai Chi
Akagi, Masato
2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 129 - 134
[24] K-means and hierarchical clustering of f0 contours
Kaland, Constantijn
Steffman, Jeremy
Cole, Jennifer
INTERSPEECH 2024, 2024, : 1520 - 1524
[25] Determining the temporal interval of segments with the help of F0 contours
Xu, Yi
Liu, Fang
JOURNAL OF PHONETICS, 2007, 35 (03) : 398 - 420
[26] DECLINATION OF FUNDAMENTAL FREQUENCY (F0) IN SPEECH PRODUCTION
COOPER, WE
SORENSEN, JM
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S67 - S67
[27] F0 analysis for Japanese conversational speech synthesis
Nakajima, Hideharu
Sagisaka, Yoshinori
2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +
[28] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis
Yu, Kai
Young, Steve
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1071 - 1079
[29] SAFE: a Statistical Algorithm for F0 Estimation for Both Clean and Noisy Speech
Chu, Wei
Alwan, Abeer
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2598 - 2601
[30] Role of the scalar f0(980) in the process Ds+→π+π0π0*
张晗
吕云鹤
刘利娟
王恩
Chinese Physics C, 2023, (04) : 47 - 53

← 1 2 3 4 5 →