A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis

被引:9
|
作者
Bellegarda, Jerome R. [1 ]
机构
[1] Apple Comp Inc, Speech & Language Technol, Cupertino, CA 95014 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
关键词
Candidate ranking; concatenation-specific cost weighting; concatenative speech synthesis; multiple information streams; unit selection;
D O I
10.1109/TASL.2009.2035209
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Unit selection text-to-speech synthesis relies on multiple cost criteria, each encapsulating a different aspect of acoustic and prosodic context at any given concatenation point. Constraints are normally invoked on diverse characteristics such as inter-unit discontinuity, overall pitch contour, local duration profile, etc., leading to costs often too heterogeneous for a direct quantitative comparison. In order to rank available candidate units, this complexity must be reduced to a single number, and the relative importance of each information stream becomes highly critical. Yet this influence is typically determined in an empirical manner (e. g., based on a limited amount of synthesized data), yielding global weights that are thus applied to broad classes of concatenations indiscriminately. This paper proposes an alternative approach, dynamic cost weighting, based on a data-driven framework separately optimized for each concatenation considered. Specifically, the cost distribution in every stream is dynamically leveraged on a per concatenation basis to locally shift weight towards those characteristics that offer a high discrimination between candidate units, and away from those characteristics that are intrinsically less discriminative. An illustrative case study demonstrates the potential benefits of this solution, and listening evidence suggests that it does indeed entail higher perceived TTS quality.
引用
收藏
页码:1455 / 1463
页数:9
相关论文
共 50 条
  • [31] Issues in text-to-speech synthesis
    Macchi, M
    IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
  • [32] Modelling speech temporal structure for Estonian text-to-speech synthesis: Feature selection
    Mihkla, Meelis
    TRAMES-JOURNAL OF THE HUMANITIES AND SOCIAL SCIENCES, 2007, 11 (03): : 284 - 298
  • [33] Applying Scalable Phonetic Context Similarity in Unit Selection of Concatenative Text-to-Speech
    Zhang, Wei
    Cui, Xiaodong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 154 - 157
  • [34] Scalable implementation of unit selection based text-to-speech system for embedded solutions
    Nukaga, Nobuo
    Kamoshida, Ryota
    Nagamatsu, Kenji
    Kitahara, Yoshinori
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 849 - 852
  • [35] Multilingual text analysis for text-to-speech synthesis
    Bell Lab, Murray Hill, United States
    International Conference on Spoken Language Processing, ICSLP, Proceedings, 1996, 3 : 1365 - 1368
  • [36] Multilingual text analysis for text-to-speech synthesis
    Sproat, R
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1365 - 1368
  • [37] On the Construction of Unit Databanks for Text-to-Speech Systems
    Latsch, Vagner L.
    Netto, Sergio L.
    PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 340 - 343
  • [38] A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface
    吴志勇
    曹光琦
    蒙美玲
    蔡莲红
    Tsinghua Science and Technology, 2009, 14 (05) : 623 - 630
  • [39] Conditional Random Fields for Hierarchical Segment Selection in Text-to-Speech Synthesis
    Weiss, Christian
    Hess, Wolfgang
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2026 - 2029
  • [40] A LEXICON SET AND SOFTWARE FRAMEWORK FOR TURKISH TEXT-TO-SPEECH SYNTHESIS APPLICATIONS
    Yilmaz, Asim Egemen
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2009, 24 (04): : 735 - 744