STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引:0
|
作者
Tanaka, Kou [1 ]
Kameoka, Hirokazu [2 ]
Toda, Tomoki [3 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan
关键词
Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.
引用
收藏
页码:5665 / 5669
页数:5
相关论文
共 50 条
  • [21] Modelling and synthesising F0 contours with the Discrete Cosine Transform
    Teutenberg, Jonathan
    Watson, Catherine
    Riddle, Patricia
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3973 - +
  • [22] Effects of tone and focus on the formation and alignment of f0 contours
    Xu, Y
    JOURNAL OF PHONETICS, 1999, 27 (01) : 55 - 105
  • [23] TRANSFORMATION OF F0 CONTOURS FOR LEXICAL TONES IN CONCATENATIVE SPEECH SYNTHESIS OF TONAL LANGUAGES
    Trung-Nghia Phung
    Luong, Mai Chi
    Akagi, Masato
    2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 129 - 134
  • [24] K-means and hierarchical clustering of f0 contours
    Kaland, Constantijn
    Steffman, Jeremy
    Cole, Jennifer
    INTERSPEECH 2024, 2024, : 1520 - 1524
  • [25] Determining the temporal interval of segments with the help of F0 contours
    Xu, Yi
    Liu, Fang
    JOURNAL OF PHONETICS, 2007, 35 (03) : 398 - 420
  • [26] DECLINATION OF FUNDAMENTAL FREQUENCY (F0) IN SPEECH PRODUCTION
    COOPER, WE
    SORENSEN, JM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S67 - S67
  • [27] F0 analysis for Japanese conversational speech synthesis
    Nakajima, Hideharu
    Sagisaka, Yoshinori
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +
  • [28] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis
    Yu, Kai
    Young, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1071 - 1079
  • [29] SAFE: a Statistical Algorithm for F0 Estimation for Both Clean and Noisy Speech
    Chu, Wei
    Alwan, Abeer
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2598 - 2601
  • [30] Role of the scalar f0(980) in the process Ds+→π+π0π0*
    张晗
    吕云鹤
    刘利娟
    王恩
    Chinese Physics C, 2023, (04) : 47 - 53