STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引：0

作者：

Tanaka, Kou ^{[1
]}

Kameoka, Hirokazu ^{[2
]}

Toda, Tomoki ^{[3
]}

Nakamura, Satoshi ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan

[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.

引用

页码：5665 / 5669

页数：5

共 50 条

[1] Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement
Tanaka, Kou
Kameoka, Hirokazu
Toda, Tomoki
Nakamura, Satoshi
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1069 - 1073
[2] Generative modeling of speech F0 contours
Kameoka, Hirokazu
Yoshizato, Kota
Ishihara, Tatsuma
Ohishi, Yasunori
Kashino, Kunio
Sagayama, Shigeki
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1825 - 1829
[3] Generating F0 contours by statistical manipulation of natural F0 shapes
Saito, T
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1100 - 1106
[4] A Novel Model of F0 Contours Prediction for Continuous Speech
胡文英
祖漪清
王志中
JournalofShanghaiJiaotongUniversity, 2005, (03) : 231 - 235
[5] The use of a generative model of F0 contours for multilingual speech synthesis
Fujisaki, H
Ohno, S
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 714 - 717
[6] F0 CONTOUR ESTIMATION USING PHONETIC FEATURE IN ELECTROLARYNGEAL SPEECH ENHANCEMENT
Cai, Zexin
Xu, Zhicheng
Li, Ming
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6490 - 6494
[7] Generation of F0 contours for Vietnamese speech synthesis
Do Dat Tran
Castelli, Eric
2010 THIRD INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2010, : 158 - 162
[8] Resonances f0(1370), f0(1500) and f0(1710) within the extended Linear Sigma Model
Janowski, Stanislaus
Giacosa, Francesco
FAIRNESS 2013: FAIR NEXT GENERATION OF SCIENTISTS 2013, 2014, 503
[9] Multiband statistical learning for F0 estimation in speech
Sha, F
Burgoyne, JA
Saul, LK
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 661 - 664
[10] F0 Transformation within the Voice Conversion Framework
Hanzlicek, Zdenek
Matousek, Jindrich
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 681 - 684

← 1 2 3 4 5 →