A Two-pass Framework of Mispronunciation Detection & Diagnosis for Computer-aided Pronunciation Training

被引:0
|
作者
Qian, Xiaojun [1 ]
Meng, Helen [1 ]
Soong, Frank [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a two-pass framework of mispronunciation detection and diagnosis (MD&D) - detection followed by diagnosis, without the need of explicit error pattern modeling, so that the main efforts can be devoted to improving acoustic modeling by discriminative training (or by applying alternative models like neural nets). The framework instantiates a set of anti-phones and a filler model in addition to the original phone model set, and crafts a general and compact phone error detection network. The detection network guarantees full coverage of all possible error patterns while maximally exploits the constraint offered by the text prompt. Specifically, it includes anti-phones to detect substitutions, filler model to detect insertions, and skips to detect deletions, so there is no prior assumptions on the possible form of error patterns. The subsequent diagnosis step expands the detected insertions and substitutions into phone networks, after which another recognition pass reveals the true identities of the detected errors. The crux of the trick is to bring down the modeling and recognition granularity down in the detection pass. Discriminative training (DT) of the detection and diagnosis models by minimizing the two expected full-sequence phone-level errors in the respective passes brings down the overall phone-level MD&D error by a relative of 40%. In particular, visualization of models in the framework shows that discriminative training effectively separates the canonical phones and their anti-phones.
引用
收藏
页码:384 / 387
页数:4
相关论文
共 50 条
  • [1] A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training
    Qian, Xiaojun
    Meng, Helen
    Soong, Frank
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1020 - 1028
  • [2] Discriminative Acoustic Model for Improving Mispronunciation Detection and Diagnosis in Computer-Aided Pronunciation Training (CAPT)
    Qian, Xiaojun
    Soong, Frank
    Meng, Helen
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 757 - 760
  • [3] The Use of DBN-HMEMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training
    Qian, Xiaojun
    Meng, Helen
    Soong, Frank
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 774 - 777
  • [4] Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling
    Liu, Zongming
    Wang, Li
    Li, Junfeng
    Zhang, Pengyuan
    Shengxue Xuebao/Acta Acustica, 2023, 48 (01): : 264 - 273
  • [5] On Mispronunciation Lexicon Generation using Joint-sequence Multigrams in Computer-Aided Pronunciation Training (CAPT)
    Qian, Xiaojun
    Meng, Helen
    Soong, Frank
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 872 - 875
  • [6] Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training
    Wang, Yow-Bang
    Lee, Lin-Shan
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 818 - 821
  • [7] A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training
    Su, Pei-hao
    Wu, Chuan-hsun
    Lee, Lin-shan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 127 - 141
  • [8] Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training
    Kroeger, Bernd J.
    Birkholz, Peter
    Hoffmann, Ruediger
    Meng, Helen
    DEVELOPMENT OF MULTIMODAL INTERFACES: ACTIVE LISTING AND SYNCHRONY, 2010, 5967 : 337 - +
  • [9] Computer-Aided Diagnosis Framework for ADHD Detection Using Quantitative EEG
    Holker, Ruchi
    Susan, Seba
    BRAIN INFORMATICS (BI 2022), 2022, 13406 : 229 - 240
  • [10] Optimization of computer-aided english pronunciation training data analysis system
    Liang C.
    Shang J.
    Computer-Aided Design and Applications, 2021, 18 (s4): : 37 - 48