Measuring the naturalness of synthetic speech

被引:0
|
作者
Howard C. Nusbaum
Alexander L. Francis
Anne S. Henly
机构
[1] The University of Chicago,Center for Computational Psychology, Committee on Cognition and Communication
关键词
synthetic speech; naturalness; intelligibility; perception;
D O I
10.1007/BF02215800
中图分类号
学科分类号
摘要
Even the highest quality synthetic speech generated by rule sounds unlike human speech. As the intelligibility of rule-based synthetic speech improves, and the number of applications for synthetic speech increases, the naturalness of synthetic speech will become an important factor in determining its use. In order to improve this aspect of the quality of synthetic speech it is necessary to have diagnostic tests that can measure naturalness. Currently, all of the available metrics for evaluating the acceptability of synthetic speech do not distinguish sufficiently between measuring overall acceptability (including naturalness) and simply measuring the ability of listeners to extract intelligible information from the signal. In this paper we propose a new methodology for measuring the naturalness of particular aspects of synthesized speech, independent of the intelligibility of the speech. Although naturalness is a multidimensional, subjective quality of speech, this methodology makes it possible to assess the separate contributions of prosodic, segmental, and source characteristics of the utterance. In two experiments, listeners reliably differentiated the naturalness of speech produced by two male talkers and two text-to-speech systems. Furthermore, they reliably differentiated between the two text-to-speech systems. The results of these experiments demonstrate that perception of naturalness is affected by information contained within the smallest part of speech, the glottal pulse, and by information contained within the prosodic structure of a syllable. These results show that this new methodology does provide a solid basis for measuring and diagnosing the naturalness of synthetic speech.
引用
收藏
页码:7 / 19
页数:12
相关论文
共 50 条
  • [1] CONSIDERATIONS ON PARCOR SYNTHETIC SPEECH NATURALNESS
    ISHII, N
    MURAKAMI, K
    KINOSHITA, K
    MIYAHARA, S
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1975, 23 (5-6): : 502 - 516
  • [2] Experimental study on the naturalness of synthetic speech
    LU Shinan
    [J]. Chinese Journal of Acoustics, 1993, (03) : 258 - 264
  • [3] Towards Linguistic Naturalness of Synthetic Speech
    Matousek, Jindrich
    Skarnitzl, Radek
    Tihelka, Daniel
    Machac, Pavel
    [J]. WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2011, VOL I, 2011, : 561 - +
  • [4] Deep Learning Based Assessment of Synthetic Speech Naturalness
    Mittag, Gabriel
    Moeller, Sebastian
    [J]. INTERSPEECH 2020, 2020, : 1748 - 1752
  • [5] Spectral Mismatch as the Index of Quality of Naturalness in Synthetic Speech
    Kawachale, S. P.
    Gengaje, S. R.
    Chitode, J. S.
    [J]. 2009 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 808 - 813
  • [6] Effect of prosodic naturalness on segmental acceptability in synthetic speech
    Vainio, M
    Järvikivi, J
    Werner, S
    Volk, N
    Välikangas, J
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 143 - 146
  • [7] CONTROL RULE OF VOICE SOURCE TO IMPROVE NATURALNESS OF SYNTHETIC SPEECH
    NAKAYAMA, T
    ICHIKAWA, A
    NAKATA, K
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1163 - &
  • [8] A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
    Yoshimura, Takenori
    Henter, Gustav Eje
    Watts, Oliver
    Wester, Mirjam
    Yamagishi, Junichi
    Tokuda, Keiichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 342 - 346
  • [9] The Naturalness of Speech Synthesis
    Peng, Hailing
    Wang, Feng
    [J]. INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING BIOMEDICAL ENGINEERING, AND INFORMATICS (SPBEI 2013), 2014, : 722 - 727
  • [10] Naturalness in the speech of the deaf
    不详
    [J]. LANCET, 1929, 1 : 112 - 112