CNN-RNN-CTC BASED END-TO-END MISPRONUNCIATION DETECTION AND DIAGNOSIS

被引:0
|
作者
Leung, Wai-Kim [1 ]
Liu, Xunying [1 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Human Comp Commun Lab, Dept Syst Engn & Engn Management, BDDA Res Ctr, Hong Kong, Peoples R China
关键词
Computer Assisted Pronunciation Training (CAPT); Mispronunciation Detection and Diagnosis (MDD); Connectionist Temporal Classification (CTC); Convolutional Neural Network (CNN); e-learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on using Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Connectionist Temporal Classification (CTC) to build an end-to-end speech recognition for Mispronunciation Detection and Diagnosis (MDD) task. Our approach is end-to-end models, while phonemic or graphemic information, or forced alignment between different linguistic units, are not required. We conduct experiments that compare the proposed CNN-RNN-CTC approach with alternative mispronunciation detection and diagnoses (MDD) approaches. The F-measure of our approach is 74.65%, which significantly outperforms the Extended Recognition Network (ERN) (S-AM) by 44.75% and State-level Acoustic Model (S-AM) by 32.28% relatively. The relative improvement in F-measure when over Acoustic-Phonemic Model (APM), Acoustic-Graphemic Model (AGM) and Acoustic-Phonemic-Graphemic Model (APGM) are 9.57%, 5.04% and 2.77% respectively.
引用
收藏
页码:8132 / 8136
页数:5
相关论文
共 50 条
  • [1] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    [J]. INTERSPEECH 2021, 2021, : 3954 - 3958
  • [2] End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms
    Yan, Bi-Cheng
    Chen, Berlin
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 61 - 65
  • [3] End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
    Peng, Linkai
    Gao, Yingming
    Bao, Rian
    Li, Ya
    Zhang, Jinsong
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [4] Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
    Hari, Takaaki
    Watanabe, Shinji
    Zhang, Yu
    Chan, William
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 949 - 953
  • [5] End-to-End Mispronunciation Detection with Simulated Error Distance
    Zhang, Zhan
    Wang, Yuehai
    Yang, Jianyi
    [J]. INTERSPEECH 2022, 2022, : 4327 - 4331
  • [6] An Effective End-to-End Modeling Approach for Mispronunciation Detection
    Lo, Tien-Hong
    Weng, Shi-Yan
    Chang, Hsiu-Jui
    Chen, Berlin
    [J]. INTERSPEECH 2020, 2020, : 3027 - 3031
  • [7] Offline Handwritten Devanagari Word Recognition Using CNN-RNN-CTC
    Bisht M.
    Gupta R.
    [J]. SN Computer Science, 4 (1)
  • [8] EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS
    Wang, Hsin-Wei
    Yan, Bi-Cheng
    Chiu, Hsuan-Sheng
    Hsu, Yung-Chang
    Chen, Berlin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6817 - 6821
  • [9] Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms
    Tien-Hong Lo
    Yao-Ting Sung
    Chen, Berlin
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1049 - 1055
  • [10] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    [J]. IEEE ACCESS, 2022, 10 : 106451 - 106462