A Two-pass Framework of Mispronunciation Detection & Diagnosis for Computer-aided Pronunciation Training

被引：0

作者：

Qian, Xiaojun ^{[1
]}

Meng, Helen ^{[1
]}

Soong, Frank ^{[2
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2015年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents a two-pass framework of mispronunciation detection and diagnosis (MD&D) - detection followed by diagnosis, without the need of explicit error pattern modeling, so that the main efforts can be devoted to improving acoustic modeling by discriminative training (or by applying alternative models like neural nets). The framework instantiates a set of anti-phones and a filler model in addition to the original phone model set, and crafts a general and compact phone error detection network. The detection network guarantees full coverage of all possible error patterns while maximally exploits the constraint offered by the text prompt. Specifically, it includes anti-phones to detect substitutions, filler model to detect insertions, and skips to detect deletions, so there is no prior assumptions on the possible form of error patterns. The subsequent diagnosis step expands the detected insertions and substitutions into phone networks, after which another recognition pass reveals the true identities of the detected errors. The crux of the trick is to bring down the modeling and recognition granularity down in the detection pass. Discriminative training (DT) of the detection and diagnosis models by minimizing the two expected full-sequence phone-level errors in the respective passes brings down the overall phone-level MD&D error by a relative of 40%. In particular, visualization of models in the framework shows that discriminative training effectively separates the canonical phones and their anti-phones.

引用

页码：384 / 387

页数：4

共 50 条

[1] A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training
Qian, Xiaojun
Meng, Helen
Soong, Frank
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1020 - 1028
[2] Discriminative Acoustic Model for Improving Mispronunciation Detection and Diagnosis in Computer-Aided Pronunciation Training (CAPT)
Qian, Xiaojun
Soong, Frank
Meng, Helen
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 757 - 760
[3] The Use of DBN-HMEMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training
Qian, Xiaojun
Meng, Helen
Soong, Frank
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 774 - 777
[4] Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling
Liu, Zongming
Wang, Li
Li, Junfeng
Zhang, Pengyuan
Shengxue Xuebao/Acta Acustica, 2023, 48 (01): : 264 - 273
[5] On Mispronunciation Lexicon Generation using Joint-sequence Multigrams in Computer-Aided Pronunciation Training (CAPT)
Qian, Xiaojun
Meng, Helen
Soong, Frank
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 872 - 875
[6] Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training
Wang, Yow-Bang
Lee, Lin-Shan
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 818 - 821
[7] A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training
Su, Pei-hao
Wu, Chuan-hsun
Lee, Lin-shan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 127 - 141
[8] Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training
Kroeger, Bernd J.
Birkholz, Peter
Hoffmann, Ruediger
Meng, Helen
DEVELOPMENT OF MULTIMODAL INTERFACES: ACTIVE LISTING AND SYNCHRONY, 2010, 5967 : 337 - +
[9] Computer-Aided Diagnosis Framework for ADHD Detection Using Quantitative EEG
Holker, Ruchi
Susan, Seba
BRAIN INFORMATICS (BI 2022), 2022, 13406 : 229 - 240
[10] Optimization of computer-aided english pronunciation training data analysis system
Liang C.
Shang J.
Computer-Aided Design and Applications, 2021, 18 (s4): : 37 - 48

← 1 2 3 4 5 →