The Target Cost Formulation in Unit Selection Speech Synthesis

被引：0

作者：

Taylor, Paul ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech synthesis; unit selection; target cost; decision trees; neural networks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We review the various approaches that have been used to define the target cost in unit selection speech synthesis and show that there are a number of different and sometimes incompatible ways of defining this. We propose that this cost should be thought of as a measure of how similar two units sound to a human listener. We discuss the issue of what features should be used in unit selection and the pros and cons of using derived features such as F0. We then explore some algorithms used to calculate target costs and show that none are really ideal for the problem. Finally, we propose a new solution to this that uses a neural network to synthesise points in acoustic space around which we can build new clusters of units at run time.

引用

页码：2038 / 2041

页数：4

共 50 条

[1] Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis
Guennec, David
Chevelu, Jonathan
Lolive, Damien
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 149 - 157
[2] A classifier-based target cost for unit selection speech synthesis trained on perceptual data
Strom, Volker
King, Simon
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 150 - 153
[3] Speech unit selection based on target values driven by speech data in concatenative speech synthesis
Hirai, T
Tenpaku, S
Shikano, K
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 43 - 46
[4] Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Fu, Ruibo
Tao, Jianhua
Zheng, Yibin
Wen, Zhengqi
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2514 - 2518
[5] OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION
Pobar, Miran
Martincic-Ipsic, Sanda
Ipsic, Ivo
[J]. NEURAL NETWORK WORLD, 2012, 22 (05) : 429 - 441
[6] Joint Target and Join Cost Weight Training for Unit Selection Synthesis
Latacz, Lukas
Mattheyses, Wesley
Verhelst, Werner
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 328 - +
[7] Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
Vepa, Jithendra
King, Simon
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1763 - 1771
[8] Syllable specific unit selection cost functions for text-to-speech synthesis
Narendra, N.P.
Sreenivasa Rao, K.
[J]. ACM Transactions on Speech and Language Processing, 2012, 9 (03):
[9] A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis
Bellegarda, Jerome R.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1455 - 1463
[10] Unit selection speech synthesis in noise
Cernak, Milos
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 761 - 764

← 1 2 3 4 5 →