EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS

被引:4
|
作者
Wang, Hsin-Wei [1 ]
Yan, Bi-Cheng [1 ]
Chiu, Hsuan-Sheng [2 ]
Hsu, Yung-Chang [3 ]
Chen, Berlin [1 ]
机构
[1] Natl Taiwan Normal Univ, Taipei, Taiwan
[2] Chunghwa Telecom Labs, Taipei, Taiwan
[3] EZAI, Taipei, Taiwan
关键词
Computer-assisted language training; mispronunciation detection and diagnosis; non-autoregressive; pronunciation modeling;
D O I
10.1109/ICASSP43922.2022.9747569
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) neural modeling has emerged as one predominant school of thought to develop computer-assisted pronunciation training (CAPT) systems, showing competitive performance to conventional pronunciation-scoring based methods. However, current E2E neural methods for CAPT are faced with at least two pivotal challenges. On one hand, most of the E2E methods operate in an autoregressive manner with left-to-right beam search to dictate the pronunciations of an L2 learners. This however leads to very slow inference speed, which inevitably hinders their practical use. On the other hand, E2E neural methods are normally data-hungry and meanwhile an insufficient amount of nonnative training data would often reduce their efficacy on mispronunciation detection and diagnosis (MD&D). In response, we put forward a novel MD&D method that leverages non-autoregressive (NAR) E2E neural modeling to dramatically speed up the inference time while maintaining performance in line with the conventional E2E neural methods. In addition, we design and develop a pronunciation modeling network stacked on top of the NAR E2E models of our method to further boost the effectiveness of MD&D. Empirical experiments conducted on the L2-ARCTIC English dataset seems to validate the feasibility of our method, in comparison to some top-of-the-line E2E models and an iconic pronunciation-scoring based method built on a DNN-HMM acoustic model.
引用
收藏
页码:6817 / 6821
页数:5
相关论文
共 50 条
  • [1] Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
    Wadud, Md. Anwar Hussen
    Alatiyyah, Mohammed
    Mridha, M. F.
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [2] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [3] End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
    Libovicky, Jindrich
    Helcl, Jindrich
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3016 - 3021
  • [4] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    [J]. INTERSPEECH 2021, 2021, : 3954 - 3958
  • [5] An Effective End-to-End Modeling Approach for Mispronunciation Detection
    Lo, Tien-Hong
    Weng, Shi-Yan
    Chang, Hsiu-Jui
    Chen, Berlin
    [J]. INTERSPEECH 2020, 2020, : 3027 - 3031
  • [6] End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms
    Yan, Bi-Cheng
    Chen, Berlin
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 61 - 65
  • [7] Non-autoregressive Deliberation-Attention based End-to-End ASR
    Gao, Changfeng
    Cheng, Gaofeng
    Zhou, Jun
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [8] End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
    Peng, Linkai
    Gao, Yingming
    Bao, Rian
    Li, Ya
    Zhang, Jinsong
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [9] Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
    Wang, Tianzi
    Fujita, Yuya
    Chang, Xuankai
    Watanabe, Shinji
    [J]. INTERSPEECH 2021, 2021, : 3755 - 3759
  • [10] Non-autoregressive End-to-End TTS with Coarse-to-Fine Decoding
    Wang, Tao
    Liu, Xuefei
    Tao, Jianhua
    Yi, Jiangyan
    Fu, Ruibo
    Wen, Zhengqi
    [J]. INTERSPEECH 2020, 2020, : 3984 - 3988