Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech

被引:4
|
作者
Lee, Yun Kyung [1 ]
Park, Jeon Gue [1 ]
机构
[1] Elect & Telecommun Res Inst ETRI, Artificial Intelligence Res Lab, Daejeon 34129, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 06期
关键词
fluency evaluation; speech recognition; data augmentation; variational autoencoder; speech conversion; NONPARALLEL VOICE CONVERSION; BLIND SEPARATION; RECOGNITION;
D O I
10.3390/app11062642
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper addresses an automatic proficiency evaluation and speech recognition for second language (L2) speech. The proposed method recognizes the speech uttered by the L2 speaker, measures a variety of fluency scores, and evaluates the proficiency of the speaker's spoken English. Stress and rhythm scores are one of the important factors used to evaluate fluency in spoken English and are computed by comparing the stress patterns and the rhythm distributions to those of native speakers. In order to compute the stress and rhythm scores even when the phonemic sequence of the L2 speaker's English sentence is different from the native speaker's one, we align the phonemic sequences based on a dynamic time-warping approach. We also improve the performance of the speech recognition system for non-native speakers and compute fluency features more accurately by augmenting the non-native training dataset and training an acoustic model with the augmented dataset. In this work, we augment the non-native speech by converting some speech signal characteristics (style) while preserving its linguistic information. The proposed variational autoencoder (VAE)-based speech conversion network trains the conversion model by decomposing the spectral features of the speech into a speaker-invariant content factor and a speaker-specific style factor to estimate diverse and robust speech styles. Experimental results show that the proposed method effectively measures the fluency scores and generates diverse output signals. Also, in the proficiency evaluation and speech recognition tests, the proposed method improves the proficiency score performance and speech recognition accuracy for all proficiency areas compared to a method employing conventional acoustic models.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis
    Dines, John
    Liang, Hui
    Saheer, Lakshmi
    Gibson, Matthew
    Byrne, William
    Oura, Keiichiro
    Tokuda, Keiichi
    Yamagishi, Junichi
    King, Simon
    Wester, Mirjam
    Hirsimaki, Teemu
    Karhila, Reima
    Kurimo, Mikko
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02): : 420 - 437
  • [42] Evaluating UK research in speech and language therapy
    Lewison, G
    Carding, P
    INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS, 2003, 38 (01) : 65 - 84
  • [43] Evaluating Prosody of Mandarin Speech for Language Learning
    Dong, Minghui
    Li, Haizhou
    Nwe, Tin Lay
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1986 - 1989
  • [44] Evaluating a Speech-Language Pathology Technology
    Pulga, Marina Jorge
    Spinardi-Panes, Ana Carulina
    Lopes-Herrera, Simone Aparecida
    Maximino, Luciana Paula
    TELEMEDICINE AND E-HEALTH, 2014, 20 (03) : 269 - 271
  • [45] Recognizing Uncertainty in Speech
    Heather Pon-Barry
    Stuart M. Shieber
    EURASIP Journal on Advances in Signal Processing, 2011
  • [46] Recognizing Uncertainty in Speech
    Pon-Barry, Heather
    Shieber, Stuart M.
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
  • [47] Recognizing emotion in speech
    Dellaert, F
    Polzin, T
    Waibel, A
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1970 - 1973
  • [48] Speech Convergence in Second Language Teaching Class
    Feng Jing
    校园英语, 2020, (02) : 4 - 4
  • [49] TASK REPETITION AND SECOND LANGUAGE SPEECH PROCESSING
    Lambert, Craig
    Kormos, Judit
    Minn, Danny
    STUDIES IN SECOND LANGUAGE ACQUISITION, 2017, 39 (01) : 167 - 196
  • [50] Second language speech comprehensibility: A research agenda
    Crowther, Dustin
    Isbell, Daniel R.
    LANGUAGE TEACHING, 2023,