Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis

被引:8
|
作者
Narendra, N. P. [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol Kharagpur, Sch Informat Technol, Kharagpur 721302, W Bengal, India
关键词
Text-to-speech synthesis; Unit selection; Target cost; Concatenation cost; Tuning of weights; Genetic algorithm; SYNTHESIS SYSTEM;
D O I
10.1016/j.asoc.2012.09.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a method for tuning the weights of unit selection cost functions in syllable based text-to-speech (TTS) synthesis system. In this work, unit selection cost functions, namely target cost and concatenation cost, are designed appropriate to syllables. The method tunes the weights in such a way that perceptual preference patterns are appropriately considered while selecting the units. The method uses genetic algorithm to derive the optimal weights. Fitness function is designed to map perceptual preference patterns into weights of unit selection cost functions. The effectiveness of proposed method is evaluated by both subjective and objective measures. From the results, it is observed that the derived optimal weights can synthesize good quality speech compared to manually tuned weights. (C) 2012 Published by Elsevier B.V.
引用
收藏
页码:773 / 781
页数:9
相关论文
共 50 条
  • [1] Syllable specific unit selection cost functions for text-to-speech synthesis
    Narendra, N.P.
    Sreenivasa Rao, K.
    [J]. ACM Transactions on Speech and Language Processing, 2012, 9 (03):
  • [2] Extracting user preferences by GTM for aiGA weight tuning in unit selection text-to-speech synthesis
    Formiga, Lluis
    Alias, Francese
    [J]. COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 654 - +
  • [3] Globally optimal training of unit boundaries in unit selection text-to-speech synthesis
    Bellegarda, Jerome R.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 957 - 965
  • [4] A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis
    Bellegarda, Jerome R.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1455 - 1463
  • [5] Intensity Modeling for Syllable Based Text-to-Speech Synthesis
    Reddy, V. Ramu
    Rao, K. Sreenivasa
    [J]. CONTEMPORARY COMPUTING, 2012, 306 : 106 - 117
  • [6] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
    Lakkavalli, Vikram Ramesh
    Arulmozhi, P.
    Ramakrishnan, A. G.
    [J]. 2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
  • [7] A Text-to-Speech Platform for Variable Length Optimal Unit Searching Using Perception Based Cost Functions
    Minkyu Lee
    Daniel P. Lopresti
    Joseph P. Olive
    [J]. International Journal of Speech Technology, 2003, 6 (4) : 347 - 356
  • [8] Efficient Unit-Selection in Text-to-Speech Synthesis
    Mihelic, Ales
    Gros, Jerneja Zganec
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
  • [9] Diphone-based unit selection for Catalan text-to-speech synthesis
    Guaus, R
    Iriondo, I
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 277 - 282
  • [10] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
    Gros, Jerneja Zganec
    Zganec, Mario
    [J]. Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78