HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling

被引:0
|
作者
Maeno, Yu [1 ]
Nose, Takashi [1 ]
Kobayashi, Takao [1 ]
Ijima, Yusuke [2 ]
Nakajima, Hideharu [2 ]
Mizuno, Hideyuki [2 ]
Yoshioka, Osamu [2 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Tokyo, Japan
[2] NTT Corp, NTT Cyber Space Labs, Miami, FL USA
关键词
HMM-based speech synthesis; expressive speech; emphasis expression; unsupervised labeling; F0; generation; EMPHASIS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an approach to HMM-based expressive speech synthesis which does not require any supervised labeling process for emphasis context. We use appealing-style speech whose sentences were taken from real domains. To reduce the cost for labeling speech data with an emphasis context for the model training, we propose an unsupervised labeling technique of the emphasis context based on the difference between original and generated F0 patterns of training sentences. Although the criterion for the emphasis labeling is quite simple, subjective evaluation results reveal that the unsupervised labeling is comparable to the labeling conducted carefully by a human in terms of speech naturalness and emphasis reproducibility.
引用
收藏
页码:1860 / +
页数:2
相关论文
共 50 条
  • [1] HMM-Based Thai Speech Synthesis Using Unsupervised Stress Context Labeling
    Moungsri, Decha
    Koriyama, Tomoki
    Kobayashi, Takao
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [2] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    [J]. SPEECH COMMUNICATION, 2014, 57 : 144 - 154
  • [3] Unsupervised adaptation for HMM-based speech synthesis
    King, Simon
    Tokuda, Keiichi
    Zen, Heiga
    Yamagishi, Junichi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
  • [4] HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL
    Nose, Takashi
    Ooki, Koujirou
    Kobayashi, Takao
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4622 - 4625
  • [5] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
    Tsiakoulis, Pirros
    Breslin, Catherine
    Gasic, Milica
    Henderson, Matthew
    Kim, Dongho
    Szummer, Martin
    Thomson, Blaise
    Young, Steve
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] HMM-BASED EMPHATIC SPEECH SYNTHESIS FOR CORRECTIVE FEEDBACK IN COMPUTER-AIDED PRONUNCIATION TRAINING
    Ning, Yishuang
    Wu, Zhiyong
    Jia, Jia
    Meng, Fanbo
    Meng, Helen
    Cai, Lianhong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4934 - 4938
  • [7] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
  • [8] Croatian HMM-based speech synthesis
    Department of Informatics, Faculty of Philosophy, University of Rijeka, Omladinska 14, Rijeka
    51000, Croatia
    [J]. J. Compt. Inf. Technol, 2006, 4 (307-313):
  • [9] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Oura, Keiichiro
    Tokuda, Keiichi
    Yamagishi, Junichi
    King, Simon
    Wester, Mirjam
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4594 - 4597
  • [10] HMM-Based Vietnamese Speech Synthesis
    Trinh Quoc Son
    [J]. 2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353