Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems

被引:0
|
作者
Moeller, Sebastian [1 ]
Hinterleitner, Florian [1 ]
Falk, Tiago H. [2 ]
Polzehl, Tim [1 ]
机构
[1] TU Berlin, Qual & Usabil Lab, Deutsch Telekom Labs, Berlin, Germany
[2] Bloorview Res Inst, Toronto, ON, Canada
关键词
speech synthesis; quality prediction; Quality of Experience (QoE);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we compare and combine different approaches for instrumentally predicting the perceived quality of Text-to-Speech systems. First, a log-likelihood is determined by comparing features extracted from the synthesized speech signal with features trained on natural speech. Second, parameters are extracted which capture quality-relevant degradations of the synthesized speech signal. Both approaches are combined and evaluated on three auditory test databases. The results show that auditory quality judgments can in many cases be predicted with a sufficiently high accuracy and reliability, but that there are considerable differences, mainly between male and female speech samples.
引用
收藏
页码:1325 / +
页数:2
相关论文
共 50 条
  • [1] Comparison of measures of speech quality for listening tests of text-to-speech systems
    Viswanathan, M
    Viswanathan, M
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 11 - 14
  • [2] Subjective evaluation and comparison of the speech quality of text-to-speech systems for the German language
    Klaus, H
    Fellbaum, K
    Sotscheck, J
    [J]. ACUSTICA, 1997, 83 (01): : 124 - 136
  • [3] Perceptual Quality Dimensions of Text-to-Speech Systems
    Hinterleitner, Florian
    Moeller, Sebastian
    Norrenbrock, Christoph
    Heute, Ulrich
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
  • [4] Enhancing the Quality of Nepali Text-to-Speech Systems
    Ghimire, Rupak Raj
    Bal, Bal Krishna
    [J]. CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
  • [5] MOS and pair comparison combined methods for quality evaluation of text-to-speech systems
    Salza, PL
    Foti, E
    Nebbia, L
    Oreglia, M
    [J]. ACUSTICA, 1996, 82 (04): : 650 - 656
  • [6] Predicting the Quality of Text-To-Speech Systems from a Large-Scale Feature Set
    Hinterleitner, Florian
    Norrenbrock, Christoph R.
    Moeller, Sebastian
    Heute, Ulrich
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 383 - 387
  • [7] Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches
    Damper, RI
    Marchand, Y
    Adamson, MJ
    Gustafson, K
    [J]. COMPUTER SPEECH AND LANGUAGE, 1999, 13 (02): : 155 - 176
  • [8] Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
    Gupta, Rishabh
    Falk, Tiago H.
    [J]. 2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
  • [9] A text analyzer for Korean text-to-speech systems
    Lee, SH
    Oh, YH
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695
  • [10] Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese
    Quintas, Sebastiao
    Trancoso, Isabel
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 34 - 42