STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引：0

作者：

Tanaka, Kou ^{[1
]}

Kameoka, Hirokazu ^{[2
]}

Toda, Tomoki ^{[3
]}

Nakamura, Satoshi ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan

[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.

引用

页码：5665 / 5669

页数：5

共 50 条

[31] TUSK: A framework for overviewing the performance of F0 estimators
Morise, Masanori
Kawahara, Hideki
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1790 - 1794
[32] Auditive learning based Chinese F0 prediction
Tao, JH
Ni, X
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 500 - 503
[33] Improving F0 Prediction Using Bidirectional Associative Memories and Syllable-Level F0 Features for HMM-based Mandarin Speech Synthesis
Gao, Li
Ling, Zhen-Hua
Chen, Ling-Hui
Dai, Li-Rong
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 275 - 279
[34] Auditive learning based Chinese F0 prediction
Tao, JH
Ni, X
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 213 - 216
[35] Prediction of F0 parameter of contextualized utterances in dialogue
Yamashita, Y
Mizoguchi, R
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1509 - 1512
[36] A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction
Tanaka, Kou
Toda, Tomoki
Nakamura, Satoshi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (09): : 2165 - 2173
[37] Using F0 Contours to Assess Nativeness in a Sentence Repeat Task
Ma, Min
Evanini, Keelan
Loukina, Anastassia
Wang, Xinhao
Zechner, Klaus
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 653 - 657
[38] A functional model for generation of the local components of F0 contours in Chinese
Ni, JF
Wang, RH
Xia, DY
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1644 - 1647
[39] A Method for Automatically Estimating F0 Model Parameters and A Speech Re-Synthesis Tool Using F0 Model and STRAIGHT
Sato, Shota
Kimura, Taro
Horiuchi, Yasuo
Nishida, Masafumi
Kuroiwa, Shingo
Ichikawa, Akira
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 545 - +
[40] F0 estimation of noisy speech based on complex speech analysis
Kinjo, Tatsuhiko
Funaki, Keiichi
2006 IEEE 12TH DIGITAL SIGNAL PROCESSING WORKSHOP & 4TH IEEE SIGNAL PROCESSING EDUCATION WORKSHOP, VOLS 1 AND 2, 2006, : 434 - 437

← 1 2 3 4 5 →