OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION

被引:1
|
作者
Pobar, Miran [1 ]
Martincic-Ipsic, Sanda [1 ]
Ipsic, Ivo [1 ]
机构
[1] Univ Rijeka, Dept Informat, Rijeka 51000, Croatia
关键词
Speech synthesis; statistical parametrical synthesis; unit selection; weight tuning;
D O I
10.14311/NNW.2012.22.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A well known problem in unit selection speech synthesis is designing the join and target function sub-costs and optimizing their corresponding weights so that they reflect the human listeners' preferences. To achieve this we propose a procedure where an objective criterion for optimal speech unit selection is used. The objective criterion for tuning the cost function weights is based on automatic speech recognition results. In order to demonstrate the effectiveness of the proposed method listening tests with 31 naive listeners were performed. The experimental results have shown that the proposed method improves speech quality and intelligibility. In order to evaluate the quality of synthesized speech the unit selection speech synthesis system is compared with two other Croatian speech synthesis systems with voices built using the same recorded speech corpus. One of these voices was built with the Festival speech synthesis system using the statistical parametric method and the other is a diphone concatenation based text-to-speech system. The comparison is based on subjective tests using MOS (mean opinion score) evaluation. The system using the proposed method used for cost function weights optimization performs better than other compared systems according to the subjective tests.
引用
收藏
页码:429 / 441
页数:13
相关论文
共 50 条
  • [21] Towards Intonation Control in Unit Selection Speech Synthesis
    Boidin, Cedric
    Boeffard, Olivier
    Moudenc, Thierry
    Damnati, Geraldine
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 736 - +
  • [22] On the Impact of Labialization Contexts on Unit Selection Speech Synthesis
    Tihelka, Daniel
    Hanzlicek, Zdenek
    Machac, Pavel
    Skarnitzl, Radek
    Matousek, Jindrich
    [J]. 2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 187 - 192
  • [23] Joint Prosodic and Segmental Unit Selection Speech Synthesis
    Clark, Robert A. J.
    King, Simon
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1312 - 1315
  • [24] COMPRESSED SENSING FOR UNIT SELECTION BASED SPEECH SYNTHESIS
    Sharma, Pulkit
    Abrol, Vinayak
    Sao, Anil Kumar
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1731 - 1735
  • [25] Quality deterioration factors in unit selection speech synthesis
    Tihelka, Daniel
    Matousek, Jindfich
    Kala, Jiri
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 508 - 515
  • [26] Age Recognition Based on Speech Signals using Weights Supervector
    Porat, Royi
    Lange, Dan
    Zigel, Yaniv
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2818 - 2821
  • [27] The impact of speech recognition on speech synthesis
    Ostendorf, M
    Bulyko, I
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 99 - 106
  • [28] Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech
    Barra-Chicote, Roberto
    Yamagishi, Junichi
    King, Simon
    Manuel Montero, Juan
    Macias-Guarasa, Javier
    [J]. SPEECH COMMUNICATION, 2010, 52 (05) : 394 - 404
  • [29] Slovak speech database for experiments and application building in unit-selection speech synthesis
    Rusko, M
    Trnka, M
    Darzágín, S
    Cernak, M
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 457 - 464
  • [30] High quality Arabic text-to-speech synthesis using unit selection
    Abdelmalek, Raja
    Mnasri, Zied
    [J]. 2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5