A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks

被引:19
|
作者
Yoshimura, Takenori [1 ]
Henter, Gustav Eje [2 ]
Watts, Oliver [2 ]
Wester, Mirjam [2 ]
Yamagishi, Junichi [2 ,3 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[3] Natl Inst Informat, Tokyo, Japan
基金
英国工程与自然科学研究理事会; 日本科学技术振兴机构;
关键词
speech synthesis; naturalness; neural network; Blizzard Challenge;
D O I
10.21437/Interspeech.2016-847
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulus- and system-level predictions.
引用
收藏
页码:342 / 346
页数:5
相关论文
共 50 条
  • [1] Automatic Naturalness Recognition from Acted Speech Using Neural Networks
    Atmaja, Bagus Tris
    Sasou, Akira
    Akagi, Masato
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 731 - 736
  • [2] Synthetic Speech Detection Using Neural Networks
    Reimao, Ricardo
    Tzerpos, Vassilios
    [J]. 2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 97 - 102
  • [3] Measuring the naturalness of synthetic speech
    Howard C. Nusbaum
    Alexander L. Francis
    Anne S. Henly
    [J]. International Journal of Speech Technology, 1997, 2 (1) : 7 - 19
  • [4] CONSIDERATIONS ON PARCOR SYNTHETIC SPEECH NATURALNESS
    ISHII, N
    MURAKAMI, K
    KINOSHITA, K
    MIYAHARA, S
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1975, 23 (5-6): : 502 - 516
  • [5] Experimental study on the naturalness of synthetic speech
    LU Shinan
    [J]. Chinese Journal of Acoustics, 1993, (03) : 258 - 264
  • [6] Towards Linguistic Naturalness of Synthetic Speech
    Matousek, Jindrich
    Skarnitzl, Radek
    Tihelka, Daniel
    Machac, Pavel
    [J]. WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2011, VOL I, 2011, : 561 - +
  • [7] Continuous mandarin speech recognition using hierarchical recurrent neural networks
    Liao, YF
    Chen, WY
    Chen, SH
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3370 - 3373
  • [8] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    CHEN, WY
    LIAO, YF
    CHEN, SH
    [J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
  • [9] Single Channel Speech Source Separation Using Hierarchical Deep Neural Networks
    Noorani, Seyed Majid
    Seyedin, Sanaz
    [J]. 2020 28TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2020, : 466 - 470
  • [10] Deep Learning Based Assessment of Synthetic Speech Naturalness
    Mittag, Gabriel
    Moeller, Sebastian
    [J]. INTERSPEECH 2020, 2020, : 1748 - 1752