Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引:0
|
作者
Nose, Takashi [1 ]
Kobayashi, Takao [2 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan
来源
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年
关键词
HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;
D O I
10.1109/IIH-MSP.2014.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.
引用
收藏
页码:578 / 581
页数:4
相关论文
共 50 条
  • [31] An exploration of the accentuation effect: errors in memory for voice fundamental frequency (F0) and speech rate
    Gous, Georgina
    Dunn, Andrew K.
    Baguley, Thom
    Stacey, Paula C.
    LANGUAGE COGNITION AND NEUROSCIENCE, 2018, 33 (01) : 98 - 110
  • [32] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
  • [33] F0 prediction model of speech synthesis based on template and statistical method
    Tao, JH
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 497 - 504
  • [34] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis
    Yu, Kai
    Young, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1071 - 1079
  • [35] PROBABLISTIC MODELLING OF F0 IN UNVOICED REGIONS IN HMM BASED SPEECH SYNTHESIS
    Yu, K.
    Toda, T.
    Gasic, M.
    Keizer, S.
    Mairesse, F.
    Thomson, B.
    Young, S.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3773 - +
  • [36] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Soong, Frank K.
    Ling, Zhen-Hua
    Dai, Li-Rong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +
  • [37] AN F0 MODELING TECHNIQUE BASED ON PROSODIC EVENTS FOR SPONTANEOUS SPEECH SYNTHESIS
    Koriyama, Tomoki
    Nose, Takashi
    Kobayashi, Takao
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4589 - 4592
  • [38] A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    King, Simon
    Tokuda, Keiichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (157-170) : 157 - 170
  • [39] An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1059 - 1063
  • [40] F0 generation in a text-to-speech system using a database of natural F0 patterns
    da Silva, CH
    Nagle, EJ
    Runstein, F
    Violaro, F
    ITS '98 PROCEEDINGS - SBT/IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 1998, : 213 - 218