Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers

被引:137
|
作者
Hu, Wenping [1 ,2 ]
Qian, Yao [2 ]
Soong, Frank K. [2 ]
Wang, Yong [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
Computer-aided language learning; Mispronunciation detection; Deep neural network; Logistic regression; Transfer learning; ERROR; KNOWLEDGE;
D O I
10.1016/j.specom.2014.12.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mispronunciation detection is an important part in a Computer-Aided Language Learning (CALL) system. By automatically pointing out where mispronunciations occur in an utterance, a language learner can receive informative and to-the-point feedbacks. In this paper, we improve mispronunciation detection performance with a Deep Neural Network (DNN) trained acoustic model and transfer learning based Logistic Regression (LR) classifiers. The acoustic model trained by the conventional GMM-HMM based approach is refined by the DNN training with enhanced discrimination. The corresponding Goodness Of Pronunciation (GOP) scores are revised to evaluate pronunciation quality of non-native language learners robustly. A Neural Network (NN) based, Logistic Regression (LR) classifier, where a general neural network with shared hidden layers for extracting useful speech features is pre-trained firstly with pooled, training data in the sense of transfer learning, and then phone-dependent, 2-class logistic regression classifiers are trained as phone specific output layer nodes, is proposed to mispronunciation detection. The new LR classifier streamlines training multiple individual classifiers separately by learning the common feature representation via the shared hidden layer. Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed GOP measure can improve the performance of GOP based mispronunciation detection approach, i.e., 7.4% of the precision and recall rate are both improved, compared with the conventional GOP estimated from GMM-HMM. The NN-based LR classifier improves the equal precision recall rate by 25% over the best GOP based approach. It also outperforms the state-of-art Support Vector Machine (SVM) based classifier by 2.2% of equal precision recall rate improvement. Our approaches also achieve similar results on a continuous read, L2 Mandarin language learning corpus. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:154 / 166
页数:13
相关论文
共 50 条
  • [1] Mispronunciation Detection Using Deep Convolutional Neural Network Features and Transfer Learning-Based Model for Arabic Phonemes
    Nazir, Faria
    Majeed, Muhammad Nadeem
    Ghazanfar, Mustansar Ali
    Maqsood, Muazzam
    IEEE ACCESS, 2019, 7 : 52589 - 52608
  • [2] A New Neural Network Based Logistic Regression Classifier For Improving Mispronunciation Detection of L2 Language Learners
    Hu, Wenping
    Qian, Yao
    Soong, Frank K.
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 245 - +
  • [3] Comparison of logistic regression and neural network-based classifiers for bacterial growth
    Hajmeer, M
    Basheer, I
    FOOD MICROBIOLOGY, 2003, 20 (01) : 43 - 55
  • [4] Comparison of Logistic Regression and Neural Network Classifiers in the Detection of Hard Exudates in Retinal Images
    Garcia, Maria
    Valverde, Carmen
    Lopez, Maria I.
    Poza, Jesus
    Hornero, Roberto
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 5891 - 5894
  • [5] Lithography Hotspot Detection Method Based on Transfer Learning Using Pre-Trained Deep Convolutional Neural Network
    Liao, Lufeng
    Li, Sikun
    Che, Yongqiang
    Shi, Weijie
    Wang, Xiangzhao
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [6] Transfer Learning for Automatic Image Orientation Detection Using Deep Learning and Logistic Regression
    Amjoud, Ayoub Benali
    Amrouch, Mustapha
    IEEE ACCESS, 2022, 10 : 128543 - 128553
  • [7] A Survey: Neural Network-Based Deep Learning for Acoustic Event Detection
    Xia, Xianjun
    Togneri, Roberto
    Sohel, Ferdous
    Zhao, Yuanjun
    Huang, Defeng
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (08) : 3433 - 3453
  • [8] A Survey: Neural Network-Based Deep Learning for Acoustic Event Detection
    Xianjun Xia
    Roberto Togneri
    Ferdous Sohel
    Yuanjun Zhao
    Defeng Huang
    Circuits, Systems, and Signal Processing, 2019, 38 : 3433 - 3453
  • [9] Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition
    Lee, Moa
    Lee, Jeehye
    Chang, Joon-Hyuk
    DIGITAL SIGNAL PROCESSING, 2019, 85 : 1 - 9
  • [10] Automated micro-plastic detection and classification using deep convolution neural network pre-trained models and transfer learning
    Devipriya, K.
    Tlija, Mehdi
    Kumar, Chanumolu Kiran
    Kumar, V. Chandra
    Jana, Subrata
    Jana, Chiranjibe
    AIP ADVANCES, 2025, 15 (02)