End-to-End Mispronunciation Detection with Simulated Error Distance

被引:2
|
作者
Zhang, Zhan [1 ]
Wang, Yuehai [1 ]
Yang, Jianyi [1 ]
机构
[1] Zhejiang Univ, Dept Informat & Elect Engn, Hangzhou, Zhejiang, Peoples R China
来源
关键词
mispronunciation detection; second language learning; speech recognition; TRANSFORMER; SPEECH;
D O I
10.21437/Interspeech.2022-870
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the development of deep learning, the performance of the mispronunciation detection model has improved greatly. However, the annotation for mispronunciation is quite expensive as it requires the experts to carefully judge the error for each pronounced phoneme. As a result, the supervised end-to-end mispronunciation detection model faces the problem of data shortage. Although the text-based data augmentation can partially alleviate this problem, we analyze that it only simulates the categorical phoneme error. Such a simulation is inefficient for the real situation. In this paper, we propose a novel unit-based data augmentation method. Our method converts the continuous audio signal into the robust audio vector and then into the discrete unit sequence. By modifying this unit sequence, we generate a more reasonable mispronunciation and can get the vector distance as the error indicator. By training on such simulated data, the experiments on L2Arctic show that our method can improve the performance of the mispronunciation detection task compared with the text-based method.
引用
收藏
页码:4327 / 4331
页数:5
相关论文
共 50 条
  • [1] An Effective End-to-End Modeling Approach for Mispronunciation Detection
    Lo, Tien-Hong
    Weng, Shi-Yan
    Chang, Hsiu-Jui
    Chen, Berlin
    [J]. INTERSPEECH 2020, 2020, : 3027 - 3031
  • [2] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    [J]. INTERSPEECH 2021, 2021, : 3954 - 3958
  • [3] End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms
    Yan, Bi-Cheng
    Chen, Berlin
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 61 - 65
  • [4] End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
    Peng, Linkai
    Gao, Yingming
    Bao, Rian
    Li, Ya
    Zhang, Jinsong
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [5] Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms
    Tien-Hong Lo
    Yao-Ting Sung
    Chen, Berlin
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1049 - 1055
  • [6] CNN-RNN-CTC BASED END-TO-END MISPRONUNCIATION DETECTION AND DIAGNOSIS
    Leung, Wai-Kim
    Liu, Xunying
    Meng, Helen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8132 - 8136
  • [7] EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS
    Wang, Hsin-Wei
    Yan, Bi-Cheng
    Chiu, Hsuan-Sheng
    Hsu, Yung-Chang
    Chen, Berlin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6817 - 6821
  • [8] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    [J]. IEEE ACCESS, 2022, 10 : 106451 - 106462
  • [9] Is End-to-End Distance a Good Reaction Coordinate?
    Harris, Nolan
    Botello, Eric
    Chen, Wei-Hung
    Lin, Kuan-Jiuh
    Kiang, Ching-Hwa
    [J]. BIOPHYSICAL JOURNAL, 2009, 96 (03) : 290A - 290A
  • [10] MEAN END-TO-END DISTANCE OF BRANCHED POLYMERS
    REDNER, S
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1979, 12 (09): : L239 - L244