OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION

被引：1

作者：

Pobar, Miran ^{[1
]}

Martincic-Ipsic, Sanda ^{[1
]}

Ipsic, Ivo ^{[1
]}

机构：

[1] Univ Rijeka, Dept Informat, Rijeka 51000, Croatia

来源：

NEURAL NETWORK WORLD | 2012年 / 22卷 / 05期

关键词：

Speech synthesis; statistical parametrical synthesis; unit selection; weight tuning;

D O I：

10.14311/NNW.2012.22.026

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A well known problem in unit selection speech synthesis is designing the join and target function sub-costs and optimizing their corresponding weights so that they reflect the human listeners' preferences. To achieve this we propose a procedure where an objective criterion for optimal speech unit selection is used. The objective criterion for tuning the cost function weights is based on automatic speech recognition results. In order to demonstrate the effectiveness of the proposed method listening tests with 31 naive listeners were performed. The experimental results have shown that the proposed method improves speech quality and intelligibility. In order to evaluate the quality of synthesized speech the unit selection speech synthesis system is compared with two other Croatian speech synthesis systems with voices built using the same recorded speech corpus. One of these voices was built with the Festival speech synthesis system using the statistical parametric method and the other is a diphone concatenation based text-to-speech system. The comparison is based on subjective tests using MOS (mean opinion score) evaluation. The system using the proposed method used for cost function weights optimization performs better than other compared systems according to the subjective tests.

引用

页码：429 / 441

页数：13

共 50 条

[1] The Target Cost Formulation in Unit Selection Speech Synthesis
Taylor, Paul
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2038 - 2041
[2] Assessing a Speaker for Fast Speech in Unit Selection Speech Synthesis
Moers, Donata
Wagner, Petra
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2015 - +
[3] Implementation and verification of speech database for unit selection speech synthesis
Szklanny, Krzysztof
Koszuta, Sebastian
[J]. PROCEEDINGS OF THE 2017 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2017, : 1263 - 1267
[4] Unit selection speech synthesis in noise
Cernak, Milos
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 761 - 764
[5] Recording and annotation of speech corpus for Czech unit selection speech synthesis
Matousek, Jindrich
Romportl, Jan
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 326 - +
[6] Polish unit selection speech synthesis with BOSS: extensions and speech corpora
Demenko, Grazyna
Klessa, Katarzyna
Szymanski, Marcin
Breuer, Stefan
Hess, Wolfgang
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2010, 13 (02) : 85 - 99
[7] Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
Vepa, Jithendra
King, Simon
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1763 - 1771
[8] Syllable specific unit selection cost functions for text-to-speech synthesis
Narendra, N.P.
Sreenivasa Rao, K.
[J]. ACM Transactions on Speech and Language Processing, 2012, 9 (03):
[9] A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis
Bellegarda, Jerome R.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1455 - 1463
[10] Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis
Guennec, David
Chevelu, Jonathan
Lolive, Damien
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 149 - 157

← 1 2 3 4 5 →