Automatic speech recognition based on weighted minimum classification error (W-MCE) training method

被引:3
|
作者
Fu, Qiang [1 ]
Juang, Biing-Hwang [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
non-uniform error cost; weighted MCE;
D O I
10.1109/ASRU.2007.4430124
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Bayes decision theory [1] is the foundation of the classical statistical pattern recognition approach. For most of pattern recognition problems, the Bayes decision theory is employed assuming that the system performance metric is defined as the simple error counting, which assigns identical cost to each recognition error. However, this prevalent performance metric is not desirable in many practical applications. For example, the cost of "recognition" error is required to be differentiated in keyword spotting systems. In this paper, we propose an extended framework for the speech recognition problem with non-uniform classification/recognition error cost. As the system performance metric, the recognition error is weighted based on the task objective. The Bayes decision theory is employed according to this performance metric and the decision rule with a non-uniform error cost function is derived. We argue that the minimum classification error (MCE) method, after appropriate generalization, is the most suitable training algorithm for the "optimal" classifier design to minimize the weighted error rate. We formulate the weighted MCE (W-MCE) algorithm based on the conventional MCE infrastructure by integrating the error cost and the recognition error count into one objective function. In the context of automatic speech recognition (ASR), we present a variety of training scenarios and weighting strategies under this extended framework. The experimental demonstration for large vocabulary continuous speech recognition is provided to support the effectiveness of our approach.
引用
收藏
页码:278 / 283
页数:6
相关论文
共 50 条
  • [1] Minimum classification error factor analysis (MCE-FA) for automatic speech recognition
    Rahim, M
    Saul, L
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 172 - 178
  • [2] Minimum word classification error training of HMMS for automatic speech recognition
    Yan, Zhi-Jie
    Zhu, Bo
    Hu, Yu
    Wang, Ren-Hua
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4521 - 4524
  • [3] A frequency-weighted HMM based on minimum error classification for noisy speech recognition
    Matsumoto, H
    Ono, M
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1511 - 1514
  • [4] Phone-Discriminating Minimum Classification Error (P-MCE) Training for Phonetic Recognition
    Fu, Qiang
    He, Xiaodong
    Deng, Li
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2197 - +
  • [5] Maximum likelihood and minimum classification error factor analysis for automatic speech recognition
    Saul, LK
    Rahim, MG
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (02): : 115 - 125
  • [6] PROTOTYPE-BASED MINIMUM ERROR TRAINING FOR SPEECH RECOGNITION
    MCDERMOTT, E
    KATAGIRI, S
    APPLIED INTELLIGENCE, 1994, 4 (03) : 245 - 256
  • [7] A Modified Minimum Classification Error (MCE) Training Algorithm for Dimensionality Reduction
    Xuechuan Wang
    Kuldip K. Paliwal
    Journal of VLSI signal processing systems for signal, image and video technology, 2002, 32 : 19 - 28
  • [8] A modified Minimum Classification Error (MCE) training algorithm for dimensionality reduction
    Wang, XC
    Paliwal, KK
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2002, 32 (1-2): : 19 - 28
  • [9] Audio-visual speech recognition using minimum classification error training
    Miyajima, C
    Tokuda, K
    Kitamura, T
    NEURAL NETWORKS FOR SIGNAL PROCESSING X, VOLS 1 AND 2, PROCEEDINGS, 2000, : 3 - 12
  • [10] Audio-visual speech recognition using minimum classification error training
    Miyajima, Chiyomi
    Tokuda, Keiichi
    Kitamura, Tadashi
    Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, 2000, 1 : 3 - 12