Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers

被引:137
|
作者
Hu, Wenping [1 ,2 ]
Qian, Yao [2 ]
Soong, Frank K. [2 ]
Wang, Yong [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
Computer-aided language learning; Mispronunciation detection; Deep neural network; Logistic regression; Transfer learning; ERROR; KNOWLEDGE;
D O I
10.1016/j.specom.2014.12.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mispronunciation detection is an important part in a Computer-Aided Language Learning (CALL) system. By automatically pointing out where mispronunciations occur in an utterance, a language learner can receive informative and to-the-point feedbacks. In this paper, we improve mispronunciation detection performance with a Deep Neural Network (DNN) trained acoustic model and transfer learning based Logistic Regression (LR) classifiers. The acoustic model trained by the conventional GMM-HMM based approach is refined by the DNN training with enhanced discrimination. The corresponding Goodness Of Pronunciation (GOP) scores are revised to evaluate pronunciation quality of non-native language learners robustly. A Neural Network (NN) based, Logistic Regression (LR) classifier, where a general neural network with shared hidden layers for extracting useful speech features is pre-trained firstly with pooled, training data in the sense of transfer learning, and then phone-dependent, 2-class logistic regression classifiers are trained as phone specific output layer nodes, is proposed to mispronunciation detection. The new LR classifier streamlines training multiple individual classifiers separately by learning the common feature representation via the shared hidden layer. Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed GOP measure can improve the performance of GOP based mispronunciation detection approach, i.e., 7.4% of the precision and recall rate are both improved, compared with the conventional GOP estimated from GMM-HMM. The NN-based LR classifier improves the equal precision recall rate by 25% over the best GOP based approach. It also outperforms the state-of-art Support Vector Machine (SVM) based classifier by 2.2% of equal precision recall rate improvement. Our approaches also achieve similar results on a continuous read, L2 Mandarin language learning corpus. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:154 / 166
页数:13
相关论文
共 50 条
  • [31] Progressive Neural Network-based Knowledge Transfer in Acoustic Models
    Moriya, Takafumi
    Masumura, Ryo
    Asami, Taichi
    Shinohara, Yusuke
    Delcroix, Marc
    Yamaguchi, Yoshikazu
    Aono, Yushi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 998 - 1002
  • [32] Recurrent and Deep Learning Neural Network Models for DDoS Attack Detection
    Sumathi, S.
    Rajesh, R.
    Lim, Sangsoon
    JOURNAL OF SENSORS, 2022, 2022
  • [33] Transfer learning-based deep ensemble neural network for plant leaf disease detection
    Vallabhajosyula, Sasikala
    Sistla, Venkatramaphanikumar
    Kolli, Venkata Krishna Kishore
    JOURNAL OF PLANT DISEASES AND PROTECTION, 2022, 129 (03) : 545 - 558
  • [34] Transfer learning-based deep ensemble neural network for plant leaf disease detection
    Sasikala Vallabhajosyula
    Venkatramaphanikumar Sistla
    Venkata Krishna Kishore Kolli
    Journal of Plant Diseases and Protection, 2022, 129 : 545 - 558
  • [35] An integrated model based on deep learning classifiers and pre-trained transformer for phishing URL detection
    Do, Nguyet Quang
    Selamat, Ali
    Fujita, Hamido
    Krejcar, Ondrej
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 161 : 269 - 285
  • [36] TrojanFlow: A Neural Backdoor Attack to Deep Learning-based Network Traffic Classifiers
    Ning, Rui
    Xin, Chunsheng
    Wu, Hongyi
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 1429 - 1438
  • [37] Classification of Chronic Kidney Disease using Logistic Regression, Feedforward Neural Network and Wide & Deep Learning
    Al Imran, Abdullah
    Amin, Md Nur
    Johora, Fatema Tuj
    2018 INTERNATIONAL CONFERENCE ON INNOVATION IN ENGINEERING AND TECHNOLOGY (ICIET), 2018,
  • [38] Transfer Learning based Performance Comparison of the Pre-Trained Deep Neural Networks
    Kumar, Jayapalan Senthil
    Anuar, Syahid
    Hassan, Noor Hafizah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (01) : 797 - 805
  • [39] TL-NID: Deep Neural Network with Transfer Learning for Network Intrusion Detection
    Masum, Mohammad
    Shahriar, Hossain
    INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST-2020), 2020, : 64 - 70
  • [40] Generalized Logistic Regression Models Using Neural Network Basis Functions Applied to the Detection of Banking Crises
    Gutierrez, P. A.
    Salcedo-Sanz, S.
    Segovia-Vargas, M. J.
    Sanchis, A.
    Portilla-Figueras, J. A.
    Fernandez-Navarro, F.
    Hervas-Martinez, C.
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT III, PROCEEDINGS, 2010, 6098 : 1 - +