Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection

被引:1
|
作者
Wadud, Md. Anwar Hussen [1 ]
Alatiyyah, Mohammed [2 ]
Mridha, M. F. [3 ]
机构
[1] Bangladesh Univ Business & Technol, Dept Comp Sci & Engn, Dhaka 1216, Bangladesh
[2] Prince Sattam Bin Abdulaziz Univ, Coll Sci & Humanities Aflaj, Dept Comp Sci, Al Kharj 16278, Saudi Arabia
[3] Amer Int Univ Bangladesh, Dept Comp Sci & Engn, Dhaka 1216, Bangladesh
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 01期
关键词
non-autoregressive; pronunciation modeling; speech recognition; mispronunciation detection and diagnosis; attention; computer-assisted pronunciation training (CAPT);
D O I
10.3390/app13010109
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach's NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Yang, Zehui
    Watanabe, Shinji
    Higuchi, Yosuke
    Cheng, Gaofeng
    Zhang, Pengyuan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8522 - 8526
  • [22] FAST-MD: FAST MULTI-DECODER END-TO-END SPEECH TRANSLATION WITH NON-AUTOREGRESSIVE HIDDEN INTERMEDIATES
    Inaguma, Hirofumi
    Dalmia, Siddharth
    Yan, Brian
    Watanabe, Shinji
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 922 - 929
  • [23] End-to-end convolutional neural network design for automatic detection of influenza virus
    Lee, Junghwan
    Eom, Heesang
    Hariyani, Yuli Sun
    Kim, Cheonjung
    Yoo, Yongkyoung
    Lee, Jeonghoon
    Park, Cheolsoo
    [J]. Lee, Jeonghoon (jhlee0804@gmail.com), 1600, Institute of Electronics Engineers of Korea (10): : 31 - 36
  • [24] Modeling Coverage for Non-Autoregressive Neural Machine Translation
    Shan, Yong
    Feng, Yang
    Shao, Chenze
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT
    Bai, Ye
    Yi, Jiangyan
    Tao, Jianhua
    Tian, Zhengkun
    Wen, Zhengqi
    Zhang, Shuai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1897 - 1911
  • [26] An end-to-end TTS model with pronunciation predictor
    Han, Chol-Jin
    Ri, Un-Chol
    Mun, Song-Il
    Jang, Kang-Song
    Han, Song-Yun
    [J]. International Journal of Speech Technology, 2022, 25 (04) : 1013 - 1024
  • [27] End-to-End Mispronunciation Detection with Simulated Error Distance
    Zhang, Zhan
    Wang, Yuehai
    Yang, Jianyi
    [J]. INTERSPEECH 2022, 2022, : 4327 - 4331
  • [28] An end-to-end TTS model with pronunciation predictor
    Han C.-J.
    Ri U.-C.
    Mun S.-I.
    Jang K.-S.
    Han S.-Y.
    [J]. International Journal of Speech Technology, 2022, 25 (4) : 1013 - 1024
  • [29] DECOUPLING PRONUNCIATION AND LANGUAGE FOR END-TO-END CODE-SWITCHING AUTOMATIC SPEECH RECOGNITION
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Bai, Ye
    Tao, Jianhua
    Wen, Zhengqi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6249 - 6253
  • [30] End-to-End Neural Network for Vehicle Dynamics Modeling
    Hermansdorfer, Leonhard
    Trauth, Rainer
    Betz, Johannes
    Lienkamp, Markus
    [J]. 2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 407 - 412