Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引:0
|
作者
Nose, Takashi [1 ]
Kobayashi, Takao [2 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan
关键词
HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;
D O I
10.1109/IIH-MSP.2014.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.
引用
收藏
页码:578 / 581
页数:4
相关论文
共 50 条
  • [1] HMM-Based Voice Conversion Using Quantized F0 Context
    Nose, Takashi
    Ota, Yuhei
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
  • [2] TONAL CONTEXT LABELING USING QUANTIZED F0 SYMBOLS FOR IMPROVING TONE CORRECTNESS IN AVERAGE-VOICE-BASED SPEECH SYNTHESIS
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4708 - 4711
  • [3] A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    SPEECH COMMUNICATION, 2012, 54 (02) : 245 - 255
  • [4] Generation of F0 contours for Vietnamese speech synthesis
    Do Dat Tran
    Castelli, Eric
    2010 THIRD INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2010, : 158 - 162
  • [5] F0 analysis for Japanese conversational speech synthesis
    Nakajima, Hideharu
    Sagisaka, Yoshinori
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +
  • [6] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    King, Simon
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [7] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Soheil Khorram
    Hossein Sameti
    Simon King
    EURASIP Journal on Advances in Signal Processing, 2015
  • [8] HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL
    Nose, Takashi
    Ooki, Koujirou
    Kobayashi, Takao
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4622 - 4625
  • [9] SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION
    Robinson, Carl
    Obin, Nicolas
    Roebel, Axel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6830 - 6834
  • [10] Additive modeling of English F0 contour for speech synthesis
    Sakai, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 277 - 280