Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers

被引:137
|
作者
Hu, Wenping [1 ,2 ]
Qian, Yao [2 ]
Soong, Frank K. [2 ]
Wang, Yong [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
Computer-aided language learning; Mispronunciation detection; Deep neural network; Logistic regression; Transfer learning; ERROR; KNOWLEDGE;
D O I
10.1016/j.specom.2014.12.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mispronunciation detection is an important part in a Computer-Aided Language Learning (CALL) system. By automatically pointing out where mispronunciations occur in an utterance, a language learner can receive informative and to-the-point feedbacks. In this paper, we improve mispronunciation detection performance with a Deep Neural Network (DNN) trained acoustic model and transfer learning based Logistic Regression (LR) classifiers. The acoustic model trained by the conventional GMM-HMM based approach is refined by the DNN training with enhanced discrimination. The corresponding Goodness Of Pronunciation (GOP) scores are revised to evaluate pronunciation quality of non-native language learners robustly. A Neural Network (NN) based, Logistic Regression (LR) classifier, where a general neural network with shared hidden layers for extracting useful speech features is pre-trained firstly with pooled, training data in the sense of transfer learning, and then phone-dependent, 2-class logistic regression classifiers are trained as phone specific output layer nodes, is proposed to mispronunciation detection. The new LR classifier streamlines training multiple individual classifiers separately by learning the common feature representation via the shared hidden layer. Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed GOP measure can improve the performance of GOP based mispronunciation detection approach, i.e., 7.4% of the precision and recall rate are both improved, compared with the conventional GOP estimated from GMM-HMM. The NN-based LR classifier improves the equal precision recall rate by 25% over the best GOP based approach. It also outperforms the state-of-art Support Vector Machine (SVM) based classifier by 2.2% of equal precision recall rate improvement. Our approaches also achieve similar results on a continuous read, L2 Mandarin language learning corpus. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:154 / 166
页数:13
相关论文
共 50 条
  • [41] Acoustic Scene Classification Using Deep Convolutional Neural Network via Transfer Learning
    Ye, Min
    Zhong, Hong
    Song, Xiao
    Huang, Shilei
    Cheng, Gang
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 19 - 22
  • [42] Underwater Acoustic OFDM Receiver Using a Regression-based Deep Neural Network
    Hassan, Sabna
    Chen, Peng
    Rong, Yue
    Chan, Kit Yan
    2022 OCEANS HAMPTON ROADS, 2022,
  • [43] Tandem Deep Learning and Logistic Regression Models to Optimize Hypertrophic Cardiomyopathy Detection in Routine Clinical Practice
    Maanja, Maren
    Siontis, Konstantinos
    Geske, Jeffrey B.
    Ackerman, Michael J.
    Arruda-Olson, Adelaide A.
    Ommen, Steve R.
    Attia, Zachi
    Friedman, Paul
    Noseworthy, Peter A.
    CIRCULATION, 2022, 146
  • [44] Tandem deep learning and logistic regression models to optimize hypertrophic cardiomyopathy detection in routine clinical practice
    Maanja, Maren
    Noseworthy, Peter A.
    Geske, Jeffrey B.
    Ackerman, Michael J.
    Arruda-Olson, Adelaide M.
    Ommen, Steve R.
    Attia, Zachi I.
    Friedman, Paul A.
    Siontis, Konstantinos C.
    CARDIOVASCULAR DIGITAL HEALTH JOURNAL, 2022, 3 (06): : 289 - 296
  • [45] Beluga whale acoustic signal classification using deep learning neural network models
    Zhong, Ming
    Castellote, Manuel
    Dodhia, Rahul
    Ferres, Juan Lavista
    Keogh, Mandy
    Brewer, Arial
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (03): : 1834 - 1841
  • [46] Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models
    Bell, Peter
    Swietojanski, Pawel
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 238 - 247
  • [47] Emergency lane vehicle detection and classification method based on logistic regression and a deep convolutional network
    Guangming Li
    Qingjun Wang
    Congrui Zuo
    Neural Computing and Applications, 2022, 34 : 12517 - 12526
  • [48] Emergency lane vehicle detection and classification method based on logistic regression and a deep convolutional network
    Li, Guangming
    Wang, Qingjun
    Zuo, Congrui
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (15): : 12517 - 12526
  • [49] Improved bilayer convolution transfer learning neural network for industrial fault detection
    Wang, Jing
    Zhang, Wenqian
    Wu, Haiyan
    Zhou, Jinglin
    CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 2022, 100 (08): : 1814 - 1825
  • [50] AMS Intrusion Detection Method Based on Improved Generalized Regression Neural Network
    Wu, Yuhong
    Hu, Xiangdong
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (02): : 539 - 548