Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引:0
|
作者
Nose, Takashi [1 ]
Kobayashi, Takao [2 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan
来源
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年
关键词
HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;
D O I
10.1109/IIH-MSP.2014.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.
引用
收藏
页码:578 / 581
页数:4
相关论文
共 50 条
  • [21] Review of F0 modelling and generation in HMM based speech synthesis
    Yu, Kai
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 599 - 604
  • [22] Context-dependent additive log F0 model for HMM-based speech synthesis
    Zen, Heiga
    Braunschweiler, Norbert
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042
  • [23] The use of a generative model of F0 contours for multilingual speech synthesis
    Fujisaki, H
    Ohno, S
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 714 - 717
  • [24] VERY LOW BIT-RATE F0 CODING FOR PHONETIC VOCODER USING MSD-HMM WITH QUANTIZED F0 CONTEXT
    Nose, Takashi
    Kobayashi, Takao
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5236 - 5239
  • [25] IMPROVED F0 MODELING AND GENERATION IN VOICE CONVERSION
    Kunikoshi, Aki
    Qian, Yao
    Soong, Frank
    Minematsu, Nobuaki
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4568 - 4571
  • [26] F0 Transformation within the Voice Conversion Framework
    Hanzlicek, Zdenek
    Matousek, Jindrich
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 681 - 684
  • [27] Voice Conversion for Whispered Speech Synthesis
    Cotescu, Marius
    Drugman, Thomas
    Huybrechts, Goeric
    Lorenzo-Trueba, Jaime
    Moinet, Alexis
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
  • [28] Multiband statistical learning for F0 estimation in speech
    Sha, F
    Burgoyne, JA
    Saul, LK
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 661 - 664
  • [29] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
  • [30] DECLINATION OF FUNDAMENTAL FREQUENCY (F0) IN SPEECH PRODUCTION
    COOPER, WE
    SORENSEN, JM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S67 - S67