Discriminative auditory-based features for robust speech recognition

被引:21
|
作者
Mak, BKW [1 ]
Tam, YC
Li, PQ
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Li Creat Technol Inc, New Providence, NJ 07974 USA
来源
关键词
auditory-based filter; discriminative feature extraction; generalized probabilistic descent; minimum classification error;
D O I
10.1109/TSA.2003.819951
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, a new auditory-based feature extraction algorithm for robust speech recognition in noisy environments was proposed. The new features are derived by mimicking closely the human peripheral auditory process and the filters in the outer ear, middle ear, and inner ear are obtained from psychoacoustics literature with some manual adjustments. In this paper, we extend the auditory-based feature extraction algorithm and propose to further train the auditory-based filters through discriminative training. Using the data-driven approach, we optimize the filters by minimizing the subsequent recognition errors on a task. One significant contribution over similar efforts in the past (generally under the name of "discriminative feature extraction") is that we make no assumption on the parametric form of the auditory-based filters. Instead, we only require the filters to be triangular-like: The filter weights have a maximum value in the middle and then monotonically decrease to both ends. Discriminative training of these constrained auditory-based filters leads to improved performance. Furthermore, we study the combined discriminative training procedure for both feature and acoustic model parameters. Our experiments show that the best performance can be obtained in a sequential procedure under the unified framework of MCE/GPD.
引用
下载
收藏
页码:27 / 36
页数:10
相关论文
共 50 条
  • [21] Robust speech recognition method based on discriminative environment feature extraction
    Jiqing Han
    Wen Gao
    Journal of Computer Science and Technology, 2001, 16 : 458 - 464
  • [22] Robust Speech Recognition Method Based on Discriminative Environment Feature Extraction
    韩纪庆
    高文
    Journal of Computer Science & Technology, 2001, (05) : 458 - 464
  • [23] Robust speech recognition method based on discriminative environment feature extraction
    Han, JQ
    Gao, W
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2001, 16 (05) : 458 - 464
  • [24] Robust endpoint detection for speech recognition based on discriminative feature extraction
    Yamamoto, Koichi
    Jabloun, Firas
    Reinhard, Klaus
    Kawamura, Akinori
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 805 - 808
  • [25] DISCRIMINATIVE OUTPUT CODING FEATURES FOR SPEECH RECOGNITION
    Dehzangi, Omid
    Ma, Bin
    Chng, Eng Siong
    Li, Haizhou
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 89 - 92
  • [26] Jointly Optimized Discriminative Features for Speech Recognition
    Ng, Tim
    Zhang, Bing
    Long Nguyen
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2626 - 2629
  • [27] On a Generalization of Margin-Based Discriminative Training to Robust Speech Recognition
    Li, Jinyu
    Lee, Chin-Hui
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1992 - 1995
  • [28] An auditory-based distortion measure with application to concatenative speech synthesis
    Hansen, JHL
    Chappell, DT
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 489 - 495
  • [29] Learning-Based Auditory Encoding for Robust Speech Recognition
    Chiu, Yu-Hsiang Bosco
    Raj, Bhiksha
    Stern, Richard M.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 900 - 914
  • [30] LEARNING-BASED AUDITORY ENCODING FOR ROBUST SPEECH RECOGNITION
    Chiu, Yu-Hsiang Bosco
    Raj, Bhiksha
    Stern, Richard M.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4278 - 4281