Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems

被引:17
|
作者
Alam, Md. Jahangir [1 ,2 ]
Kenny, Patrick [2 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ H3C 3P8, Canada
[2] CRIM, Montreal, PQ, Canada
关键词
Speech recognition; Speaker verification; Multitaper spectrum; AURORA-2; NIST; 2010; SRE; AURORA-4; SPECTRAL-ANALYSIS;
D O I
10.1007/s12559-012-9197-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate low-variance multitaper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech and speaker recognition systems. In speech and speaker recognition, MFCC features are usually computed from a single-tapered (e.g., Hamming window) direct spectrum estimate, that is, the squared magnitude of the Fourier transform of the observed signal. Compared with the periodogram, a power spectrum estimate that uses a smooth window function, such as Hamming window, can reduce spectral leakage. Windowing may help to reduce spectral bias, but variance often remains high. A multitaper spectrum estimation method that uses well-selected tapers can gain from the bias-variance trade-off, giving an estimate that has small bias compared with a single-taper spectrum estimate but substantially lower variance. Speech recognition and speaker verification experimental results on the AURORA-2 and AURORA-4 corpora and the NIST 2010 speaker recognition evaluation corpus (telephone as well as microphone speech), respectively, show that the multitaper methods perform better compared with the Hamming-windowed spectrum estimation method. In a speaker verification task, compared with the Hamming window technique, the sinusoidal weighted cepstrum estimator, multi-peak, and Thomson multitaper techniques provide a relative improvement of 20.25, 18.73, and 12.83 %, respectively, in equal error rate.
引用
收藏
页码:533 / 544
页数:12
相关论文
共 50 条
  • [1] Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems
    Md. Jahangir Alam
    Patrick Kenny
    Douglas O’Shaughnessy
    [J]. Cognitive Computation, 2013, 5 : 533 - 544
  • [2] Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition
    On, Chin Kim
    Pandiyan, Paulraj M.
    Yaacob, Sazali
    Saudi, Azali
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS (ICOCI 2006), 2006, : 291 - +
  • [3] Mel-Frequency Cepstral Coefficients as Features for Automatic Speaker Recognition
    Jokic, Ivan D.
    Jokic, Stevan D.
    Delic, Vlado D.
    Peric, Zoran H.
    [J]. 2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 419 - 424
  • [4] Modified Mel-Frequency cepstral coefficient
    Saha, G
    Yadhunandan, US
    [J]. Proceedings of the Sixth IASTED International Conference on Signal and Image Processing, 2004, : 215 - 219
  • [5] Robust Speech Recognition Using Pereptual Wavelet Denoising and Mel-frequency Product Spectrum Cepstral Coefficient Features
    Korba, Mohamed Cherif Amara
    Messadeg, Djemil
    Djemili, Rafik
    Bourouba, Hocine
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (03): : 283 - 288
  • [6] Speaker independent phoneme recognition based on fractal dimension (DF) and the mel-frequency cepstral coefficients features
    Fekkai, S
    Al-Akaidi, M
    Blackledge, JM
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4014 - 4014
  • [7] Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification
    Kinnunen, Tomi
    Saeidi, Rahim
    Sedlak, Filip
    Lee, Kong Aik
    Sandberg, Johan
    Hansson-Sandsten, Maria
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 1990 - 2001
  • [8] Mel-Frequency Cepstral Coefficient-Based Bandwidth Extension of Narrowband Speech
    Nour-Eldin, Amr H.
    Kabal, Peter
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 53 - 56
  • [9] Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
    Darch, Jonathan
    Milner, Ben
    Vaseghi, Saeed
    [J]. Journal of the Acoustical Society of America, 2009, 124 (06): : 3989 - 4000
  • [10] Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
    Darch, Jonathan
    Milner, Ben
    Vaseghi, Saeed
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (06): : 3989 - 4000