Improving human scoring of prosody using parametric speech synthesis

被引:5
|
作者
Prafianto, Hafiyan [1 ]
Nose, Takashi [1 ]
Chiba, Yuya [1 ]
Ito, Akinori [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Aoba Ku, Aramaki Aza Aoba 6-6-05, Sendai, Miyagi 9808579, Japan
关键词
Computer assisted language learning (CALL); Computer assisted pronunciation training (CAPT); Automatic pronunciation evaluation system; Parametric speech synthesis; Average voice model; RECOGNITION;
D O I
10.1016/j.specom.2019.06.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method that utilizes parametric speech synthesis to improve human scoring of non-native speaker utterances. Instead of assessing each prosodic feature by directly listening to the utterance itself, in order to focus only on the target prosodic feature, the unassessed features are substituted with those of the native speakers. We used parametric speech synthesis to generate the features for substitution. In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional scoring method produced a score of 0.74.
引用
收藏
页码:14 / 21
页数:8
相关论文
共 50 条
  • [41] Wideband Parametric Speech Synthesis Using Warped Linear Prediction
    Raitio, Tuomo
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1418 - 1421
  • [42] Statistical parametric speech synthesis for Arabic language using ANN
    Ilyes, Rebai
    BenAyed, Yassine
    2014 1ST INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP 2014), 2014, : 452 - 457
  • [43] Statistical Parametric Speech Synthesis Using Deep Gaussian Processes
    Koriyama, Tomoki
    Kobayashi, Takao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (05) : 948 - 959
  • [44] Statistical parametric speech synthesis using a hidden trajectory model
    Cai, Ming-Qi
    Ling, Zhen-Hua
    Dai, Li-Rong
    SPEECH COMMUNICATION, 2015, 72 : 149 - 159
  • [45] Statistical Parametric Speech Synthesis Using Generalized Distillation Framework
    Liu, Zheng-Chen
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (05) : 695 - 699
  • [46] Spanish Statistical Parametric Speech Synthesis using a Neural Vocoder
    Bonafonte, Antonio
    Pascual, Santiago
    Dorca, Georgina
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1998 - 2001
  • [47] The Effect of Human Prosody on Comprehension of TTS Robot Speech
    Coyne, Adam K.
    McGinn, Conor
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1816 - 1822
  • [48] Intonational speech prosody encoding in the human auditory cortex
    Tang, C.
    Hamilton, L. S.
    Chang, E. F.
    SCIENCE, 2017, 357 (6353) : 797 - 801
  • [49] Improving Naturalness in Speech Synthesis Using Fuzzy Logic
    Shah, B. Gargi
    Sajja, S. Priti
    Lecture Notes in Networks and Systems, 2023, 645 LNNS : 225 - 238
  • [50] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58