Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection

被引:1
|
作者
Wadud, Md. Anwar Hussen [1 ]
Alatiyyah, Mohammed [2 ]
Mridha, M. F. [3 ]
机构
[1] Bangladesh Univ Business & Technol, Dept Comp Sci & Engn, Dhaka 1216, Bangladesh
[2] Prince Sattam Bin Abdulaziz Univ, Coll Sci & Humanities Aflaj, Dept Comp Sci, Al Kharj 16278, Saudi Arabia
[3] Amer Int Univ Bangladesh, Dept Comp Sci & Engn, Dhaka 1216, Bangladesh
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 01期
关键词
non-autoregressive; pronunciation modeling; speech recognition; mispronunciation detection and diagnosis; attention; computer-assisted pronunciation training (CAPT);
D O I
10.3390/app13010109
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach's NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS
    Wang, Hsin-Wei
    Yan, Bi-Cheng
    Chiu, Hsuan-Sheng
    Hsu, Yung-Chang
    Chen, Berlin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6817 - 6821
  • [2] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [3] End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
    Libovicky, Jindrich
    Helcl, Jindrich
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3016 - 3021
  • [4] NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING
    Omachi, Motoi
    Fujita, Yuya
    Watanabe, Shinji
    Wang, Tianzi
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6772 - 6776
  • [5] NON-AUTOREGRESSIVE END-TO-END APPROACHES FOR JOINT AUTOMATIC SPEECH RECOGNITION AND SPOKEN LANGUAGE UNDERSTANDING
    Li, Mohan
    Doddipatla, Rama
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 390 - 397
  • [6] A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition
    Fan, Ruchao
    Chu, Wei
    Chang, Peng
    Alwan, Abeer
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1436 - 1448
  • [7] Non-autoregressive Deliberation-Attention based End-to-End ASR
    Gao, Changfeng
    Cheng, Gaofeng
    Zhou, Jun
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [8] Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
    Wang, Tianzi
    Fujita, Yuya
    Chang, Xuankai
    Watanabe, Shinji
    [J]. INTERSPEECH 2021, 2021, : 3755 - 3759
  • [9] Non-autoregressive End-to-End TTS with Coarse-to-Fine Decoding
    Wang, Tao
    Liu, Xuefei
    Tao, Jianhua
    Yi, Jiangyan
    Fu, Ruibo
    Wen, Zhengqi
    [J]. INTERSPEECH 2020, 2020, : 3984 - 3988
  • [10] IMPROVED MASK-CTC FOR NON-AUTOREGRESSIVE END-TO-END ASR
    Higuchi, Yosuke
    Inaguma, Hirofumi
    Watanabe, Shinji
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8363 - 8367