EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS

被引:4
|
作者
Wang, Hsin-Wei [1 ]
Yan, Bi-Cheng [1 ]
Chiu, Hsuan-Sheng [2 ]
Hsu, Yung-Chang [3 ]
Chen, Berlin [1 ]
机构
[1] Natl Taiwan Normal Univ, Taipei, Taiwan
[2] Chunghwa Telecom Labs, Taipei, Taiwan
[3] EZAI, Taipei, Taiwan
关键词
Computer-assisted language training; mispronunciation detection and diagnosis; non-autoregressive; pronunciation modeling;
D O I
10.1109/ICASSP43922.2022.9747569
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) neural modeling has emerged as one predominant school of thought to develop computer-assisted pronunciation training (CAPT) systems, showing competitive performance to conventional pronunciation-scoring based methods. However, current E2E neural methods for CAPT are faced with at least two pivotal challenges. On one hand, most of the E2E methods operate in an autoregressive manner with left-to-right beam search to dictate the pronunciations of an L2 learners. This however leads to very slow inference speed, which inevitably hinders their practical use. On the other hand, E2E neural methods are normally data-hungry and meanwhile an insufficient amount of nonnative training data would often reduce their efficacy on mispronunciation detection and diagnosis (MD&D). In response, we put forward a novel MD&D method that leverages non-autoregressive (NAR) E2E neural modeling to dramatically speed up the inference time while maintaining performance in line with the conventional E2E neural methods. In addition, we design and develop a pronunciation modeling network stacked on top of the NAR E2E models of our method to further boost the effectiveness of MD&D. Empirical experiments conducted on the L2-ARCTIC English dataset seems to validate the feasibility of our method, in comparison to some top-of-the-line E2E models and an iconic pronunciation-scoring based method built on a DNN-HMM acoustic model.
引用
收藏
页码:6817 / 6821
页数:5
相关论文
共 50 条
  • [31] Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT
    Bai, Ye
    Yi, Jiangyan
    Tao, Jianhua
    Tian, Zhengkun
    Wen, Zhengqi
    Zhang, Shuai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1897 - 1911
  • [32] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    [J]. IEEE ACCESS, 2022, 10 : 106451 - 106462
  • [33] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
    Battenberg, Eric
    Chen, Jitong
    Child, Rewon
    Coates, Adam
    Gaur, Yashesh
    Li, Yi
    Liu, Hairong
    Satheesh, Sanjeev
    Sriram, Anuroop
    Zhu, Zhenyao
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213
  • [34] End-to-end Diagnosis of QoS Violations with Neural Network
    Zhou, LiFeng
    Chen, Lei
    Pung, Hung Keng
    Ngoh, Lek Heng
    [J]. 2008 IEEE 33RD CONFERENCE ON LOCAL COMPUTER NETWORKS, VOLS 1 AND 2, 2008, : 519 - 520
  • [35] End-to-End Neural Network for Vehicle Dynamics Modeling
    Hermansdorfer, Leonhard
    Trauth, Rainer
    Betz, Johannes
    Lienkamp, Markus
    [J]. 2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 407 - 412
  • [36] Neural PLDA Modeling for End-to-End Speaker Verification
    Ramoji, Shreyas
    Krishnan, Prashant
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 4333 - 4337
  • [37] Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
    Lee, Jason
    Mansimov, Elman
    Cho, Kyunghyun
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1173 - 1182
  • [38] HIERARCHICAL PROSODY MODELING AND CONTROL IN NON-AUTOREGRESSIVE PARALLEL NEURAL TTS
    Raitio, Tuomo
    Li, Jiangchuan
    Seshadri, Shreyas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7587 - 7591
  • [39] EXPLORING END-TO-END NEURAL TEXT-TO-SPEECH SYNTHESIS FOR ROMANIAN
    Dumitrache, Marius
    Rebedea, Traian
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING, 2020, : 93 - 102
  • [40] Generative non-autoregressive unsupervised keyphrase extraction with neural topic modeling
    Zhu, Xun
    Lou, Yinxia
    Zhao, Jing
    Gao, Wang
    Deng, Hongtao
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120