GRADIENT-DESCENT BASED UNIT-SELECTION OPTIMIZATION ALGORITHM USED FOR CORPUS-BASED TEXT-TO-SPEECH SYNTHESIS

被引:3
|
作者
Rojc, Matej [1 ]
Kacic, Zdravko [1 ]
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, SLO-2000 Maribor, Slovenia
关键词
SYNTHESIS SYSTEM;
D O I
10.1080/08839514.2011.595645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a gradient-descent based unit selection optimization algorithm for the optimization of unit-cost function weights and for improving the overall performance of the unit-selection algorithm, as used in a corpus-based text-to-speech synthesis system. Complex multidimensional and fuzzy-logic based unit-cost functions are used in the presented unit-selection algorithm. The weights used by these unit-cost functions are usually defined by heuristics or by listening tests. This can be very laborious and time consuming, and does not necessarily result in an optimal performance of the unit-selection algorithm because of multidimensional unit-cost function space, within which different database candidates' features are evaluated. Using heuristics or listening tests is also rather rigid, especially when working with several different databases or voices. It is especially difficult, within this scope, to set up those weights used in unit-cost functions in order to achieve overall optimal performance of the unit-selection algorithm. The proposed unit-selection optimization process consists of several steps. It is fully automatic, flexible, and fast enough to enable the development of a corpus-based text-to-speech (TTS) system that uses many different voices, without any heuristics or listening tests. This optimization process can also be helpful when evaluating the performances of unit-selection cost functions, and the performance of the unit-selection algorithm itself. The obtained results "suggest" those values that the unit-selection cost-function weights should have in order to obtain smoother transitions between selected unit candidates, after the unit-selection process. The obtained results also hint at the performance level that can be achieved with a given set of unit-cost function weights, and suggest what improvements can be gained when using those additional or changed unit-cost functions included within the unit-selection algorithm.
引用
收藏
页码:635 / 668
页数:34
相关论文
共 50 条
  • [1] Efficient Unit-Selection in Text-to-Speech Synthesis
    Mihelic, Ales
    Gros, Jerneja Zganec
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
  • [2] Corpus-based Malay Text-to-Speech Synthesis System
    Swee, Tan Tian
    Salleh, Sheikh Hussain Shaikh
    [J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
  • [3] A study of prosodic variability methods in a corpus-based unit selection text-to-speech system
    Csapo, Tamas Gabor
    Zainko, Csaba
    Nemeth, Geza
    [J]. INFOCOMMUNICATIONS JOURNAL, 2010, 2 (01): : 32 - 37
  • [4] Unit generation based on phrase break strength and pruning for corpus-based text-to-speech
    Kim, S
    Lee, Y
    Hirose, K
    [J]. ETRI JOURNAL, 2001, 23 (04) : 168 - 176
  • [5] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
    Chou, FC
    Tseng, CY
    Lee, LS
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
  • [6] A new Korean corpus-based text-to-speech system
    Kim S.
    Lee Y.
    Hirose K.
    [J]. International Journal of Speech Technology, 2002, 5 (2) : 105 - 116
  • [7] PERCEPTUAL CLUSTERING BASED UNIT SELECTION OPTIMIZATION FOR CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS
    Jiang, Tao
    Wu, Zhiyong
    Jia, Jia
    Cai, Lianhong
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 64 - 68
  • [8] Maximum Likelihood Unit Selection for Corpus-based Speech Synthesis
    Gamboa Rosales, Abubeker
    Rosales, Hamurabi Gamboa
    Hoffmann, Ruediger
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 748 - +
  • [9] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
    Lakkavalli, Vikram Ramesh
    Arulmozhi, P.
    Ramakrishnan, A. G.
    [J]. 2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
  • [10] An objective measure for assement of a corpus-based text-to-speech system
    Xu, J
    Guan, CT
    Li, HZ
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 179 - 182