GRADIENT-DESCENT BASED UNIT-SELECTION OPTIMIZATION ALGORITHM USED FOR CORPUS-BASED TEXT-TO-SPEECH SYNTHESIS

被引:3
|
作者
Rojc, Matej [1 ]
Kacic, Zdravko [1 ]
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, SLO-2000 Maribor, Slovenia
关键词
SYNTHESIS SYSTEM;
D O I
10.1080/08839514.2011.595645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a gradient-descent based unit selection optimization algorithm for the optimization of unit-cost function weights and for improving the overall performance of the unit-selection algorithm, as used in a corpus-based text-to-speech synthesis system. Complex multidimensional and fuzzy-logic based unit-cost functions are used in the presented unit-selection algorithm. The weights used by these unit-cost functions are usually defined by heuristics or by listening tests. This can be very laborious and time consuming, and does not necessarily result in an optimal performance of the unit-selection algorithm because of multidimensional unit-cost function space, within which different database candidates' features are evaluated. Using heuristics or listening tests is also rather rigid, especially when working with several different databases or voices. It is especially difficult, within this scope, to set up those weights used in unit-cost functions in order to achieve overall optimal performance of the unit-selection algorithm. The proposed unit-selection optimization process consists of several steps. It is fully automatic, flexible, and fast enough to enable the development of a corpus-based text-to-speech (TTS) system that uses many different voices, without any heuristics or listening tests. This optimization process can also be helpful when evaluating the performances of unit-selection cost functions, and the performance of the unit-selection algorithm itself. The obtained results "suggest" those values that the unit-selection cost-function weights should have in order to obtain smoother transitions between selected unit candidates, after the unit-selection process. The obtained results also hint at the performance level that can be achieved with a given set of unit-cost function weights, and suggest what improvements can be gained when using those additional or changed unit-cost functions included within the unit-selection algorithm.
引用
收藏
页码:635 / 668
页数:34
相关论文
共 50 条
  • [31] Including Pitch Accent Optionality in Unit Selection Text-to-Speech Synthesis
    Badino, Leonardo
    Clark, Robert A. J.
    Strom, Volker
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2118 - 2121
  • [32] High quality Arabic text-to-speech synthesis using unit selection
    Abdelmalek, Raja
    Mnasri, Zied
    [J]. 2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5
  • [33] Syllable specific unit selection cost functions for text-to-speech synthesis
    Narendra, N.P.
    Sreenivasa Rao, K.
    [J]. ACM Transactions on Speech and Language Processing, 2012, 9 (03):
  • [34] A corpus-based speech synthesis system with emotion
    Iida, A
    Campbell, N
    Higuchi, F
    Yasumura, M
    [J]. SPEECH COMMUNICATION, 2003, 40 (1-2) : 161 - 187
  • [35] A corpus-based speech synthesis system for Uyghur
    Silamu, Wushour
    Tursun, Nasirjan
    Tursun, Mamateli
    [J]. RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 373 - 376
  • [36] Corpus design based on the Kullback-Leibler divergence for Text-To-Speech synthesis application
    Krul, Aleksandra
    Damnati, Geraldine
    Yvon, Francois
    Moudenc, Thierry
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2030 - +
  • [37] RECENT IMPROVEMENTS OF PROBABILITY BASED PROSODY MODELS FOR UNIT SELECTION IN CONCATENATIVE TEXT-TO-SPEECH
    Zhang, Wei
    Gu, Liang
    Gao, Yuqing
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3777 - 3780
  • [38] Text categorization with class-based and corpus-based keyword selection
    Özgür, A
    Özgür, L
    Güngör, T
    [J]. COMPUTER AND INFORMATION SCIENCES - ISCIS 2005, PROCEEDINGS, 2005, 3733 : 606 - 615
  • [39] A RESEARCH BED FOR UNIT SELECTION BASED TEXT TO SPEECH SYNTHESIS
    Sarathy, K. Partha
    Ramakrishnan, A. G.
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 229 - +
  • [40] A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis
    Lee, Kai-Zhan
    Cooper, Erica
    Hirschberg, Julia
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2873 - 2877