Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

被引:0
|
作者
Ullmann, Raphael [1 ,2 ]
Rasipuram, Ramya [1 ]
Magimai-Dossi, Mathew [1 ]
Bourlard, Herve [1 ,2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
Speech intelligibility; objective measures; text-to-speech synthesis; utterance verification; KL-divergence; KL-HMM; MODEL;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Objective assessment of synthetic speech intelligibility can be a useful tool for the development of text-to-speech (TTS) systems, as it provides a reproducible and inexpensive alternative to subjective listening tests. In a recent work, it was shown that the intelligibility of synthetic speech could be assessed objectively by comparing two sequences of phoneme class conditional probabilities, corresponding to instances of synthetic and human reference speech, respectively. In this paper, we build on those findings to propose a novel approach that formulates objective intelligibility assessment as an utterance verification problem using hidden Markov models, thereby alleviating the need for human reference speech. Specifically, given each text input to the TTS system, the proposed approach automatically verifies the words in the output synthetic speech signal and estimates an intelligibility score based on word recall statistics. We evaluate the proposed approach on the 2011 Blizzard Challenge data, and show that the estimated scores and the subjective intelligibility scores are highly correlated (Pearson's vertical bar R vertical bar = 0.94).
引用
收藏
页码:3501 / 3505
页数:5
相关论文
共 50 条
  • [1] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    [J]. VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +
  • [2] Method of intelligibility testing for text-to-speech systems
    Sheffield, E
    Polizzi, P
    [J]. PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A862 - A865
  • [3] Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability
    Wang, Linfang
    Wang, Lijuan
    Teng, Yan
    Geng, Zhe
    Soong, Frank K.
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 626 - 629
  • [4] INTELLIGIBILITY OF SPEECH PRODUCED BY TEXT-TO-SPEECH SYSTEMS IN GOOD AND TELEPHONIC CONDITIONS
    DELOGU, C
    PAOLONI, A
    RIDOLFI, P
    VAGGES, K
    [J]. ACTA ACUSTICA, 1995, 3 (01): : 89 - 96
  • [5] Text-To-Speech Intelligibility across Speech Rates
    Syrdal, Ann K.
    Bunnell, H. Timothy
    Hertz, Susan R.
    Mishra, Taniya
    Spiegel, Murray
    Bickley, Corine
    Rekart, Deborah
    Makashay, Matthew J.
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 622 - 625
  • [6] Objective evaluation methods for Chinese Text-To-Speech systems
    Zhang, Teng
    Chen, Zhipeng
    Wu, Ji
    Lail, Sam
    Lei, Wenhui
    Isert, Carsten
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 332 - 336
  • [7] Beyond intelligibility - The performance of text-to-speech synthesisers
    Johnston, RD
    [J]. BT TECHNOLOGY JOURNAL, 1996, 14 (01): : 100 - 111
  • [8] PERCEPTION OF SYNTHETIC SPEECH PRODUCED AUTOMATICALLY BY RULE - INTELLIGIBILITY OF 8 TEXT-TO-SPEECH SYSTEMS
    GREENE, BG
    LOGAN, JS
    PISONI, DB
    [J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1986, 18 (02): : 100 - 107
  • [9] Combining concatenation and formant synthesis for improved intelligibility and naturalness in text-to-speech systems
    Pearson S.
    [J]. International Journal of Speech Technology, 1997, 1 (2) : 103 - 107
  • [10] A text-to-speech system with high intelligibility and naturalness for Chinese
    CHU Min and LU Shinan(Institute of Acoustics
    [J]. Chinese Journal of Acoustics, 1996, (01) : 81 - 90