Discriminative auditory-based features for robust speech recognition

被引:21
|
作者
Mak, BKW [1 ]
Tam, YC
Li, PQ
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Li Creat Technol Inc, New Providence, NJ 07974 USA
来源
关键词
auditory-based filter; discriminative feature extraction; generalized probabilistic descent; minimum classification error;
D O I
10.1109/TSA.2003.819951
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, a new auditory-based feature extraction algorithm for robust speech recognition in noisy environments was proposed. The new features are derived by mimicking closely the human peripheral auditory process and the filters in the outer ear, middle ear, and inner ear are obtained from psychoacoustics literature with some manual adjustments. In this paper, we extend the auditory-based feature extraction algorithm and propose to further train the auditory-based filters through discriminative training. Using the data-driven approach, we optimize the filters by minimizing the subsequent recognition errors on a task. One significant contribution over similar efforts in the past (generally under the name of "discriminative feature extraction") is that we make no assumption on the parametric form of the auditory-based filters. Instead, we only require the filters to be triangular-like: The filter weights have a maximum value in the middle and then monotonically decrease to both ends. Discriminative training of these constrained auditory-based filters leads to improved performance. Furthermore, we study the combined discriminative training procedure for both feature and acoustic model parameters. Our experiments show that the best performance can be obtained in a sequential procedure under the unified framework of MCE/GPD.
引用
下载
收藏
页码:27 / 36
页数:10
相关论文
共 50 条
  • [41] Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition
    Fazel, Amin
    Chakrabartty, Shantanu
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1362 - 1371
  • [42] Robust features for speech recognition based on admissible wavelet packets
    Farooq, O
    Datta, S
    ELECTRONICS LETTERS, 2001, 37 (25) : 1554 - 1556
  • [43] ROBUST EXCITATION-BASED FEATURES FOR AUTOMATIC SPEECH RECOGNITION
    Drugman, Thomas
    Stylianou, Yannis
    Chen, Langzhou
    Chen, Xie
    Gales, Mark J. F.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4664 - 4668
  • [44] STRUCTURED DISCRIMINATIVE MODELS FOR NOISE ROBUST CONTINUOUS SPEECH RECOGNITION
    Ragni, A.
    Gales, M. J. F.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4788 - 4791
  • [45] Discriminative classifiers with adaptive kernels for noise robust speech recognition
    Gales, M. J. F.
    Flego, F.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04): : 648 - 662
  • [46] STEREO-BASED STOCHASTIC MAPPING WITH DISCRIMINATIVE TRAINING FOR NOISE ROBUST SPEECH RECOGNITION
    Cui, Xiaodong
    Afify, Mohamed
    Gao, Yuqing
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3933 - +
  • [47] DUAL-CHANNEL ITERATIVE SPEECH ENHANCEMENT WITH CONSTRAINTS ON AN AUDITORY-BASED SPECTRUM
    NANDKUMAR, S
    HANSEN, JHL
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01): : 22 - 34
  • [48] Integrating Adaptive Beam-forming and Auditory Features for Robust Large Vocabulary Speech Recognition
    Sun, Xie
    Li, Qi
    Zhu, Manli
    Zhou, Qiru
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2113 - 2114
  • [49] Modeling auditory perception to improve robust speech recognition
    Strope, B
    Alwan, A
    THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 1056 - 1060
  • [50] Noise suppression based on auditory-like filters for robust speech recognition
    Zhao, JH
    Xie, X
    Kuang, JM
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 560 - 563