Improving human scoring of prosody using parametric speech synthesis

被引：5

作者：

Prafianto, Hafiyan ^{[1
]}

Nose, Takashi ^{[1
]}

Chiba, Yuya ^{[1
]}

Ito, Akinori ^{[1
]}

机构：

[1] Tohoku Univ, Grad Sch Engn, Aoba Ku, Aramaki Aza Aoba 6-6-05, Sendai, Miyagi 9808579, Japan

来源：

SPEECH COMMUNICATION | 2019年 / 111卷 / 14-21期

关键词：

Computer assisted language learning (CALL); Computer assisted pronunciation training (CAPT); Automatic pronunciation evaluation system; Parametric speech synthesis; Average voice model; RECOGNITION;

D O I：

10.1016/j.specom.2019.06.001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a method that utilizes parametric speech synthesis to improve human scoring of non-native speaker utterances. Instead of assessing each prosodic feature by directly listening to the utterance itself, in order to focus only on the target prosodic feature, the unassessed features are substituted with those of the native speakers. We used parametric speech synthesis to generate the features for substitution. In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional scoring method produced a score of 0.74.

引用

页码：14 / 21

页数：8

共 50 条

[1] Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
Lazaridis, Alexandros
Cernak, Milos
Garner, Philip N.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2298 - 2302
[2] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
Xin, Detai
Adavanne, Sharath
Ang, Federico
Kulkarni, Ashish
Takamichi, Shinnosuke
Saruwatari, Hiroshi
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,
[3] Emotional speech synthesis using subspace constraints in prosody
Mori, Shinya
Moriyama, Tsuyoshi
Ozawa, Shinji
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1093 - +
[4] PROSODY GENERATION USING FRAME-BASED GAUSSIAN PROCESS REGRESSION AND CLASSIFICATION FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Koriyama, Tomoki
Kobayashi, Takao
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4929 - 4933
[5] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chiang, Chen-Yu
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[6] Prosody and the music of the human speech
D'Autilia, R
INTERNATIONAL JOURNAL OF MODERN PHYSICS B, 2004, 18 (13): : 1919 - 1929
[7] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chen-Yu Chiang
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[8] Improving automated scoring of prosody in oral reading fluency using deep learning algorithm
Wang, Kuo
Qiao, Xin
Sammit, George
Larson, Eric C.
Nese, Joseph
Kamata, Akihito
FRONTIERS IN EDUCATION, 2024, 9
[9] Compression of prosody for speech modification in synthesis
Ansari, R
Kurek, W
THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 219 - 223
[10] SPEECH BERT EMBEDDING FOR IMPROVING PROSODY IN NEURAL TTS
Chen, Liping
Deng, Yan
Wang, Xi
Soong, Frank K.
He, Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6563 - 6567

← 1 2 3 4 5 →