STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引:0
|
作者
Tanaka, Kou [1 ]
Kameoka, Hirokazu [2 ]
Toda, Tomoki [3 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan
关键词
Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.
引用
收藏
页码:5665 / 5669
页数:5
相关论文
共 50 条
  • [31] TUSK: A framework for overviewing the performance of F0 estimators
    Morise, Masanori
    Kawahara, Hideki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1790 - 1794
  • [32] Auditive learning based Chinese F0 prediction
    Tao, JH
    Ni, X
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 500 - 503
  • [33] Improving F0 Prediction Using Bidirectional Associative Memories and Syllable-Level F0 Features for HMM-based Mandarin Speech Synthesis
    Gao, Li
    Ling, Zhen-Hua
    Chen, Ling-Hui
    Dai, Li-Rong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 275 - 279
  • [34] Auditive learning based Chinese F0 prediction
    Tao, JH
    Ni, X
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 213 - 216
  • [35] Prediction of F0 parameter of contextualized utterances in dialogue
    Yamashita, Y
    Mizoguchi, R
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1509 - 1512
  • [36] A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction
    Tanaka, Kou
    Toda, Tomoki
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (09): : 2165 - 2173
  • [37] Using F0 Contours to Assess Nativeness in a Sentence Repeat Task
    Ma, Min
    Evanini, Keelan
    Loukina, Anastassia
    Wang, Xinhao
    Zechner, Klaus
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 653 - 657
  • [38] A functional model for generation of the local components of F0 contours in Chinese
    Ni, JF
    Wang, RH
    Xia, DY
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1644 - 1647
  • [39] A Method for Automatically Estimating F0 Model Parameters and A Speech Re-Synthesis Tool Using F0 Model and STRAIGHT
    Sato, Shota
    Kimura, Taro
    Horiuchi, Yasuo
    Nishida, Masafumi
    Kuroiwa, Shingo
    Ichikawa, Akira
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 545 - +
  • [40] F0 estimation of noisy speech based on complex speech analysis
    Kinjo, Tatsuhiko
    Funaki, Keiichi
    2006 IEEE 12TH DIGITAL SIGNAL PROCESSING WORKSHOP & 4TH IEEE SIGNAL PROCESSING EDUCATION WORKSHOP, VOLS 1 AND 2, 2006, : 434 - 437