Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems

被引:17
|
作者
Alam, Md. Jahangir [1 ,2 ]
Kenny, Patrick [2 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ H3C 3P8, Canada
[2] CRIM, Montreal, PQ, Canada
关键词
Speech recognition; Speaker verification; Multitaper spectrum; AURORA-2; NIST; 2010; SRE; AURORA-4; SPECTRAL-ANALYSIS;
D O I
10.1007/s12559-012-9197-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate low-variance multitaper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech and speaker recognition systems. In speech and speaker recognition, MFCC features are usually computed from a single-tapered (e.g., Hamming window) direct spectrum estimate, that is, the squared magnitude of the Fourier transform of the observed signal. Compared with the periodogram, a power spectrum estimate that uses a smooth window function, such as Hamming window, can reduce spectral leakage. Windowing may help to reduce spectral bias, but variance often remains high. A multitaper spectrum estimation method that uses well-selected tapers can gain from the bias-variance trade-off, giving an estimate that has small bias compared with a single-taper spectrum estimate but substantially lower variance. Speech recognition and speaker verification experimental results on the AURORA-2 and AURORA-4 corpora and the NIST 2010 speaker recognition evaluation corpus (telephone as well as microphone speech), respectively, show that the multitaper methods perform better compared with the Hamming-windowed spectrum estimation method. In a speaker verification task, compared with the Hamming window technique, the sinusoidal weighted cepstrum estimator, multi-peak, and Thomson multitaper techniques provide a relative improvement of 20.25, 18.73, and 12.83 %, respectively, in equal error rate.
引用
收藏
页码:533 / 544
页数:12
相关论文
共 50 条
  • [21] On the Inversion of Mel-Frequency Cepstral Coefficients for Speech Enhancement Applications
    Boucheron, Laura E.
    De Leon, Phillip L.
    [J]. ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 485 - 488
  • [22] Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems
    Albadr, Musatafa Abbas Abbood
    Tiun, Sabrina
    Ayob, Masri
    Mohammed, Manal
    AL-Dhief, Fahad Taha
    [J]. COGNITIVE COMPUTATION, 2021, 13 (05) : 1136 - 1153
  • [23] UNDERSTANDING SARCASM IN SPEECH USING MEL-FREQUENCY CEPSTRAL COEFFICENT
    Mathur, Abhinav
    Saxena, Vikas
    Singh, Sandeep K.
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 728 - 732
  • [24] Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems
    Musatafa Abbas Abbood Albadr
    Sabrina Tiun
    Masri Ayob
    Manal Mohammed
    Fahad Taha AL-Dhief
    [J]. Cognitive Computation, 2021, 13 : 1136 - 1153
  • [25] Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech
    Raikar, Aditya
    Gandhi, Ami
    Patil, Hemant A.
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 405 - 413
  • [26] Voice Recognition and Marking Using Mel-frequency Cepstral Coefficients
    Sheu, Jia-Shing
    Chen, Ching-Wen
    [J]. SENSORS AND MATERIALS, 2020, 32 (10) : 3209 - 3220
  • [27] PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION
    Osman, Mohammed Muntaz
    Buyuk, Osman
    [J]. SIGMA JOURNAL OF ENGINEERING AND NATURAL SCIENCES-SIGMA MUHENDISLIK VE FEN BILIMLERI DERGISI, 2020, 38 (04): : 2177 - 2191
  • [28] How many Mel-frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language
    Hasan, Md. Rakibul
    Hasan, Md. Mahbub
    Hossain, Md Zakir
    [J]. JOURNAL OF ENGINEERING-JOE, 2021, 2021 (12): : 817 - 827
  • [29] Speech Emotion Recognition using Mel Frequency Cepstral Coefficient and SVM Classifier
    Fernandes, V.
    Mascarehnas, L.
    Mendonca, C.
    Johnson, A.
    Mishra, R.
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 200 - 204
  • [30] Encrypted Domain Mel-Frequency Cepstral Coefficient and Fragile Audio Watermarking
    Chen, Jian
    Chen, Ziyang
    Zheng, Peijia
    Guo, Jianting
    Zhang, Wei
    Huang, Jiwu
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (IEEE TRUSTCOM) / 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (IEEE BIGDATASE), 2018, : 68 - 73