A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks

被引:19
|
作者
Yoshimura, Takenori [1 ]
Henter, Gustav Eje [2 ]
Watts, Oliver [2 ]
Wester, Mirjam [2 ]
Yamagishi, Junichi [2 ,3 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[3] Natl Inst Informat, Tokyo, Japan
基金
英国工程与自然科学研究理事会; 日本科学技术振兴机构;
关键词
speech synthesis; naturalness; neural network; Blizzard Challenge;
D O I
10.21437/Interspeech.2016-847
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulus- and system-level predictions.
引用
收藏
页码:342 / 346
页数:5
相关论文
共 50 条
  • [21] Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency
    Tuan Dinh
    Kain, Alexander
    Samlan, Robin
    Cao, Beiming
    Wang, Jun
    [J]. INTERSPEECH 2020, 2020, : 4781 - 4785
  • [22] Noisy speech recognition by hierarchical recurrent neural fuzzy networks
    Juang, CF
    Chiou, CT
    Huang, HJ
    [J]. 2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 5122 - 5125
  • [23] Speech recognition using neural networks
    Khan, SU
    Sharma, G
    Rao, PRK
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY 2000, VOLS 1 AND 2, 2000, : 432 - 437
  • [24] SPEECH RECOGNITION USING NEURAL NETWORKS
    Kumar, T. Lalith
    Kumar, T. Kishore
    Rajan, K. Soundar
    [J]. PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 248 - +
  • [25] CONTROL RULE OF VOICE SOURCE TO IMPROVE NATURALNESS OF SYNTHETIC SPEECH
    NAKAYAMA, T
    ICHIKAWA, A
    NAKATA, K
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1163 - &
  • [26] BUILDING HMM BASED UNIT-SELECTION SPEECH SYNTHESIS SYSTEM USING SYNTHETIC SPEECH NATURALNESS EVALUATION SCORE
    Lu, Heng
    Ling, Zhen-Hua
    Dai, Li-Rong
    Wang, Ren-Hua
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5352 - 5355
  • [27] Speech Sentiment Analysis Using Hierarchical Conformer Networks
    Zhao, Peng
    Liu, Fangai
    Zhuang, Xuqiang
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [28] Hierarchical Text Categorization Using Neural Networks
    Miguel E. Ruiz
    Padmini Srinivasan
    [J]. Information Retrieval, 2002, 5 : 87 - 118
  • [29] Hierarchical text categorization using neural networks
    Ruiz, ME
    Srinivasan, P
    [J]. INFORMATION RETRIEVAL, 2002, 5 (01): : 87 - 118
  • [30] Hierarchical graph visualization using neural networks
    Kusnadi
    Carothers, JD
    Chow, F
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (03): : 794 - 799