STATISTICAL F0 PREDICTION FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT CONSIDERING GENERATIVE PROCESS OF F0 CONTOURS WITHIN PRODUCT OF EXPERTS FRAMEWORK

被引:0
|
作者
Tanaka, Kou [1 ]
Kameoka, Hirokazu [2 ]
Toda, Tomoki [3 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Tokyo, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan
关键词
Electrolaryngeal speech enhancement; F-0; prediction; Generative model; Product of Experts; VOICE CONVERSION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We have previously proposed a statistical fundamental frequency (F-0) prediction method that makes it possible to predict the underlying F-0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F-0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F-0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F-0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of -Experts model to incorporate this generative model of voice F-0 contours into the statistical F-0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F-0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F-0 contours.
引用
收藏
页码:5665 / 5669
页数:5
相关论文
共 50 条
  • [41] ANALYSIS AND SYNTHESIS OF GERMAN F0 CONTOURS BY MEANS OF FUJISAKI MODEL
    MOBIUS, B
    PATZOLD, M
    HESS, W
    SPEECH COMMUNICATION, 1993, 13 (1-2) : 53 - 61
  • [42] Study on a quantitative model for generating F0 contours of uighur sentences
    Ubul, Kurban
    Hamdulla, Askar
    Ablimit, Mijit
    Journal of Information and Computational Science, 2008, 5 (02): : 861 - 869
  • [43] Speech-in-speech perception: The role of F0, rate, and rhythm
    Fishero, Sheyenne
    Jongman, Allard
    Sereno, Joan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [44] Maximising objective speech intelligibility by local f0 modulation
    Villegas, Julian
    Cooke, Martin
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1702 - 1705
  • [45] F0 slope and mean: cues to speech segmentation in French
    Cordero, Maria del Mar
    Meunier, Fanny
    Grimault, Nicolas
    Pota, Stephane
    Spinelli, Elsa
    INTERSPEECH 2020, 2020, : 1610 - 1614
  • [46] Additive modeling of English F0 contour for speech synthesis
    Sakai, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 277 - 280
  • [47] F0 contour of prosodic word in happy speech of mandarin
    Wang, HB
    Li, AJ
    Fang, Q
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 433 - 440
  • [48] F0 declination in English and Mandarin Broadcast News Speech
    Yuan, Jiahong
    Liberman, Mark
    SPEECH COMMUNICATION, 2014, 65 : 67 - 74
  • [49] F0 Declination in English and Mandarin Broadcast News Speech
    Yuan, Jiahong
    Liberman, Mark
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 134 - 137
  • [50] Robust F0 Modeling for Mandarin Speech Recognition in Noise
    Qiang, Sheng
    Qian, Yao
    Soong, Frank K.
    Xu, Congfu
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1101 - +