Improving human scoring of prosody using parametric speech synthesis

被引:5
|
作者
Prafianto, Hafiyan [1 ]
Nose, Takashi [1 ]
Chiba, Yuya [1 ]
Ito, Akinori [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Aoba Ku, Aramaki Aza Aoba 6-6-05, Sendai, Miyagi 9808579, Japan
关键词
Computer assisted language learning (CALL); Computer assisted pronunciation training (CAPT); Automatic pronunciation evaluation system; Parametric speech synthesis; Average voice model; RECOGNITION;
D O I
10.1016/j.specom.2019.06.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method that utilizes parametric speech synthesis to improve human scoring of non-native speaker utterances. Instead of assessing each prosodic feature by directly listening to the utterance itself, in order to focus only on the target prosodic feature, the unassessed features are substituted with those of the native speakers. We used parametric speech synthesis to generate the features for substitution. In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional scoring method produced a score of 0.74.
引用
收藏
页码:14 / 21
页数:8
相关论文
共 50 条
  • [1] Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
    Lazaridis, Alexandros
    Cernak, Milos
    Garner, Philip N.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2298 - 2302
  • [2] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
    Xin, Detai
    Adavanne, Sharath
    Ang, Federico
    Kulkarni, Ashish
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,
  • [3] Emotional speech synthesis using subspace constraints in prosody
    Mori, Shinya
    Moriyama, Tsuyoshi
    Ozawa, Shinji
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1093 - +
  • [4] PROSODY GENERATION USING FRAME-BASED GAUSSIAN PROCESS REGRESSION AND CLASSIFICATION FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Koriyama, Tomoki
    Kobayashi, Takao
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4929 - 4933
  • [5] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chiang, Chen-Yu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [6] Prosody and the music of the human speech
    D'Autilia, R
    INTERNATIONAL JOURNAL OF MODERN PHYSICS B, 2004, 18 (13): : 1919 - 1929
  • [7] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chen-Yu Chiang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [8] Improving automated scoring of prosody in oral reading fluency using deep learning algorithm
    Wang, Kuo
    Qiao, Xin
    Sammit, George
    Larson, Eric C.
    Nese, Joseph
    Kamata, Akihito
    FRONTIERS IN EDUCATION, 2024, 9
  • [9] Compression of prosody for speech modification in synthesis
    Ansari, R
    Kurek, W
    THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 219 - 223
  • [10] SPEECH BERT EMBEDDING FOR IMPROVING PROSODY IN NEURAL TTS
    Chen, Liping
    Deng, Yan
    Wang, Xi
    Soong, Frank K.
    He, Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6563 - 6567