Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition

被引:14
|
作者
Fazel, Amin [1 ]
Chakrabartty, Shantanu [1 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
关键词
Auditory HMAX; gammatone functions; reproducing kernel Hilbert space (RKHS); robust speech recognition; sparse features; HIERARCHICAL ORGANIZATION; JOINT COMPENSATION; CEPSTRAL ANALYSIS; RECEPTIVE-FIELDS; ADAPTATION; PERCEPTION; MODEL;
D O I
10.1109/TASL.2011.2179294
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as "Sparse Auditory Reproducing Kernel" (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique ("MAX" operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.
引用
收藏
页码:1362 / 1371
页数:10
相关论文
共 50 条
  • [1] Sparse Kernel Cepstral Coefficients (SKCC): Inner-product based Features for Noise-Robust Speech Recognition
    Fazel, Amin
    Chakrabartty, Shantanu
    [J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 2401 - 2404
  • [2] Modeling human auditory perception for noise-robust speech recognition
    Lee, SY
    [J]. PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : PL72 - PL74
  • [3] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
    Sara Ahmadi
    Seyed Mohammad Ahadi
    Bert Cranen
    Lou Boves
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [4] Kernel Power Flow Orientation Coefficients for Noise-Robust Speech Recognition
    Gerazov, Branislav
    Ivanovski, Zoran
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) : 407 - 419
  • [5] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
    Ahmadi, Sara
    Ahadi, Seyed Mohammad
    Cranen, Bert
    Boves, Lou
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 20
  • [6] Noise-Robust Algorithm of Speech Features Extraction for Automatic Speech Recognition System
    Yakhnev, A. N.
    Pisarev, A. S.
    [J]. PROCEEDINGS OF THE XIX IEEE INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM 2016), 2016, : 206 - 208
  • [7] Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
    Mahkonen, Katariina
    Hurmalainen, Antti
    Virtanen, Tuomas
    Gemmeke, Jort
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 472 - +
  • [8] Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition
    Shi, Yanyan
    Bai, Jing
    Xue, Peiyun
    Shi, Dianxi
    [J]. IEEE ACCESS, 2019, 7 : 81911 - 81922
  • [9] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [10] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777