A Covariance-Tying Technique for HMM-Based Speech Synthesis

被引:10
|
作者
Oura, Keiichiro [1 ]
Zen, Heiga [1 ]
Nankaku, Yoshihiko [1 ]
Lee, Akinobu [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
来源
关键词
HMM; speech synthesis; decision tree; context-clustering; MDL criterion; embedded device;
D O I
10.1587/transinf.E93.D.595
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
引用
收藏
页码:595 / 601
页数:7
相关论文
共 50 条
  • [41] Evaluation of prosodic contextual factors for HMM-based speech synthesis
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, 226-8502, Japan
    Proc. Annu. Conf. Int. Speech Commun. Assoc., INTERSPEECH, (430-433):
  • [42] FACTORED MLLR ADAPTATION FOR HMM-BASED EXPRESSIVE SPEECH SYNTHESIS
    Sung, June Sig
    Hong, Doo Hwa
    Lee, Chul Min
    Kim, Nam Soo
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 974 - 977
  • [43] x Formant-controlled HMM-based Speech Synthesis
    Lei, Ming
    Yamagishi, Junichi
    Richmond, Korin
    Ling, Zhen-Hua
    King, Simon
    Dai, Li-Rong
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2788 - +
  • [44] Robust Voicing Detection and Estimation for HMM-Based Speech Synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2015, 34 (08) : 2597 - 2619
  • [45] An acoustic model adaptation using hmm-based speech synthesis
    Tanaka, K
    Kuroiwa, S
    Tsuge, S
    Ren, F
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 368 - 373
  • [46] Two-band excitation for HMM-based speech synthesis
    Kim, Sang-Jin
    Hahn, Minsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 378 - 381
  • [47] FACTOR ANALYZED VOICE MODELS FOR HMM-BASED SPEECH SYNTHESIS
    Kazumi, Kyosuke
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4234 - 4237
  • [48] Data Selection and Adaptation for Naturalness in HMM-based Speech Synthesis
    Cooper, Erica
    Chang, Alison
    Levitan, Yocheved
    Hirschberg, Julia
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 357 - +
  • [49] Emotion transplantation through adaptation in HMM-based speech synthesis
    Lorenzo-Trueba, Jaime
    Barra-Chicote, Roberto
    San-Segundo, Ruben
    Ferreiros, Javier
    Yamagishi, Junichi
    Montero, Juan M.
    COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01): : 292 - 307
  • [50] CONTEXTUAL PARTIAL ADDITIVE STRUCTURE FOR HMM-BASED SPEECH SYNTHESIS
    Takaki, Shinji
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7878 - 7882