Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引:0
|
作者
Nose, Takashi [1 ]
Kobayashi, Takao [2 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan
来源
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年
关键词
HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;
D O I
10.1109/IIH-MSP.2014.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.
引用
收藏
页码:578 / 581
页数:4
相关论文
共 50 条
  • [41] VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0
    Yutani, Kaori
    Uto, Yosuke
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3897 - 3900
  • [42] A Method for Automatically Estimating F0 Model Parameters and A Speech Re-Synthesis Tool Using F0 Model and STRAIGHT
    Sato, Shota
    Kimura, Taro
    Horiuchi, Yasuo
    Nishida, Masafumi
    Kuroiwa, Shingo
    Ichikawa, Akira
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 545 - +
  • [43] A STYLE CAPTURING APPROACH TO F0 TRANSFORMATION IN VOICE CONVERSION
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6915 - 6919
  • [44] CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion
    Patel, Maitreya
    Purohit, Mirali
    Shah, Jui
    Patil, Havant A.
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 411 - 415
  • [45] Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis
    Ogbureke, Kalu U.
    Cabral, Joao P.
    Carson-Berndsen, Julie
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 67 - 70
  • [46] A Novel Model of F0 Contours Prediction for Continuous Speech
    胡文英
    祖漪清
    王志中
    JournalofShanghaiJiaotongUniversity, 2005, (03) : 231 - 235
  • [47] Maximising objective speech intelligibility by local f0 modulation
    Villegas, Julian
    Cooke, Martin
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1702 - 1705
  • [48] F0 slope and mean: cues to speech segmentation in French
    Cordero, Maria del Mar
    Meunier, Fanny
    Grimault, Nicolas
    Pota, Stephane
    Spinelli, Elsa
    INTERSPEECH 2020, 2020, : 1610 - 1614
  • [49] TRANSFORMATION OF F0 CONTOURS FOR LEXICAL TONES IN CONCATENATIVE SPEECH SYNTHESIS OF TONAL LANGUAGES
    Trung-Nghia Phung
    Luong, Mai Chi
    Akagi, Masato
    2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 129 - 134
  • [50] JOINT MODELLING OF VOICING LABEL AND CONTINUOUS F0 FOR HMM BASED SPEECH SYNTHESIS
    Yu, K.
    Young, S.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4572 - 4575