NOISE-ROBUST SPEECH RECOGNITION WITH EXEMPLAR-BASED SPARSE REPRESENTATIONS USING ALPHA-BETA DIVERGENCE

被引:0
|
作者
Yilmaz, Emre [1 ]
Gemmeke, Jort F. [1 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, Dept ESAT, Leuven, Belgium
关键词
exemplar-based speech recognition; sparse representations; alpha-beta divergence; noise-robustness; NONNEGATIVE MATRIX FACTORIZATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we investigate the performance of a noise-robust sparse representations (SR)-based recognizer using the Alpha-Beta (AB)divergence to compare the noisy speech segments and exemplars. The baseline recognizer, which approximates noisy speech segments as a linear combination of speech and noise exemplars of variable length, uses the generalized Kullback-Leibler divergence to quantify the approximation quality. Incorporating a reconstruction errorbased back-end, the recognition performance highly depends on the congruence of the divergence measure and used speech features. Having two tuning parameters, namely alpha and beta, the AB-divergence provides improved robustness against background noise and outliers. These parameters can be adjusted for better performance depending on the distribution of speech and noise exemplars in the high-dimensional feature space. Moreover, various well-known distance/divergence measures such as the Euclidean distance, generalized Kullback-Leibler divergence, Itakura-Saito divergence and Hellinger distance are special cases of the AB-divergence for different (alpha, beta) values. The goal of this work is to investigate the optimal divergence for mel-scaled magnitude spectral features by performing recognition experiments at several SNR levels using different (alpha, beta) pairs. The results demonstrate the effectiveness of the AB-divergence compared to the generalized Kullback-Leibler divergence especially at the lower SNR levels.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition
    Fazel, Amin
    Chakrabartty, Shantanu
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1362 - 1371
  • [22] Noise-robust feature based on sparse representation for speaker recognition
    Qi, Hongzhuo
    [J]. Metallurgical and Mining Industry, 2015, 7 (04): : 64 - 69
  • [23] Noise-robust speech recognition based on difference of power spectrum
    Xu, JF
    Wei, G
    [J]. ELECTRONICS LETTERS, 2000, 36 (14) : 1247 - 1248
  • [24] Noise-Robust Speech Recognition Based on RBF Neural Network
    Hou, Xuemei
    [J]. HIGH PERFORMANCE STRUCTURES AND MATERIALS ENGINEERING, PTS 1 AND 2, 2011, 217-218 : 413 - 418
  • [25] A speech emphasis method for noise-robust speech recognition by using repetitive phrase
    Hirai, Takanori
    Kuroiwa, Shingo
    Tsuge, Satoru
    Ren, Fuji
    Fattah, Mohamed Abdel
    [J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1269 - +
  • [26] Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint
    Baby, Deepak
    Van Hamme, Hugo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) : 2024 - 2035
  • [27] A Noise-Robust Speech Recognition System Based on Wavelet Neural Network
    Wang, Yiping
    Zhao, Zhefeng
    [J]. ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 392 - 397
  • [28] Sparse Kernel Cepstral Coefficients (SKCC): Inner-product based Features for Noise-Robust Speech Recognition
    Fazel, Amin
    Chakrabartty, Shantanu
    [J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 2401 - 2404
  • [29] Noise-robust cellular phone speech recognition using CODEC-adapted speech and noise models
    Kato, T
    Naito, M
    Shimizu, T
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 285 - 288
  • [30] Noise-robust automatic speech recognition using a predictive echo state network
    Skowronski, Mark D.
    Harris, John G.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05): : 1724 - 1730