STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引：0

作者：

Tanaka, Kou ^{[1
]}

Kameoka, Hirokazu ^{[2
]}

Toda, Tomoki ^{[3
]}

Nakamura, Satoshi ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan

[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.

引用

页码：5665 / 5669

页数：5

共 50 条

[41] ANALYSIS AND SYNTHESIS OF GERMAN F0 CONTOURS BY MEANS OF FUJISAKI MODEL
MOBIUS, B
PATZOLD, M
HESS, W
SPEECH COMMUNICATION, 1993, 13 (1-2) : 53 - 61
[42] Study on a quantitative model for generating F0 contours of uighur sentences
Ubul, Kurban
Hamdulla, Askar
Ablimit, Mijit
Journal of Information and Computational Science, 2008, 5 (02): : 861 - 869
[43] Speech-in-speech perception: The role of F0, rate, and rhythm
Fishero, Sheyenne
Jongman, Allard
Sereno, Joan
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[44] Maximising objective speech intelligibility by local f0 modulation
Villegas, Julian
Cooke, Martin
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1702 - 1705
[45] F0 slope and mean: cues to speech segmentation in French
Cordero, Maria del Mar
Meunier, Fanny
Grimault, Nicolas
Pota, Stephane
Spinelli, Elsa
INTERSPEECH 2020, 2020, : 1610 - 1614
[46] Additive modeling of English F0 contour for speech synthesis
Sakai, S
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 277 - 280
[47] F0 contour of prosodic word in happy speech of mandarin
Wang, HB
Li, AJ
Fang, Q
AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 433 - 440
[48] F0 declination in English and Mandarin Broadcast News Speech
Yuan, Jiahong
Liberman, Mark
SPEECH COMMUNICATION, 2014, 65 : 67 - 74
[49] F0 Declination in English and Mandarin Broadcast News Speech
Yuan, Jiahong
Liberman, Mark
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 134 - 137
[50] Robust F0 Modeling for Mandarin Speech Recognition in Noise
Qiang, Sheng
Qian, Yao
Soong, Frank K.
Xu, Congfu
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1101 - +

← 1 2 3 4 5 →