Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients

被引:17
|
作者
Palo, Hemanta Kumar [1 ]
Chandra, Mahesh [2 ]
Mohanty, Mihir Narayan [1 ]
机构
[1] Siksha O Anusandhan Univ, Dept Elect & Commun Engn, Bhubaneswar, Odisha, India
[2] Birla Inst Technol, Dept Elect & Commun Engn, Ranchi, Bihar, India
关键词
Human speech emotion; Mel-frequency cepstral coefficient; Probabilistic neural network; Feature extraction; Wavelet analysis;
D O I
10.1007/978-981-10-4762-6_47
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this chapter, different variants of Mel-frequency cepstral coefficients (MFCCs) describing human speech emotions are investigated. These features are tested and compared for their robustness in terms of classification accuracy and mean square error. Although MFCC is a reliable feature for speech emotion recognition, it does not consider the temporal dynamics between features which is crucial for such analysis. To address this issue, delta MFCC as its first derivative is extracted for comparison. Due to poor performance of MFCC under noisy condition, both MFCC and delta MFCC features are extracted in wavelet domain in the second phase. Time-frequency characterization of emotions using wavelet analysis and energy or amplitude information using MFCC-based features has enhanced the available information. Wavelet-based MFCCs (WMFCCs) and wavelet-based delta MFCCs (WDMFCCs) outperformed standard MFCCs, delta MFCCs, and wavelets in recognition of Berlin speech emotional utterances. Probabilistic neural network (PNN) has been chosen to model the emotions as the classifier is simple to train, much faster, and allows flexible selection of smoothing parameter than other neural network (NN) models. Highest accuracy of 80.79% has been observed with WDMFCCs as compared to 60.97 and 62.76% with MFCCs and wavelets, respectively.
引用
收藏
页码:491 / 498
页数:8
相关论文
共 50 条
  • [41] Multiple time resolutions for derivatives of mel-frequency cepstral coefficients
    Stemmer, G
    Hacker, C
    Nöth, E
    Niemann, H
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 37 - 40
  • [42] Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition
    Hora, Baveet Singh
    Uthiraa, S.
    Patil, Hemant A.
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 116 - 129
  • [43] Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients
    Boucheron, Laura E.
    De Leon, Phillip L.
    Sandoval, Steven
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 610 - 619
  • [44] Robust Speech Recognition Using Pereptual Wavelet Denoising and Mel-frequency Product Spectrum Cepstral Coefficient Features
    Korba, Mohamed Cherif Amara
    Messadeg, Djemil
    Djemili, Rafik
    Bourouba, Hocine
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (03): : 283 - 288
  • [45] One Solution of Extension of Mel-Frequency Cepstral Coefficients Feature Vector for Automatic Speaker Recognition
    Jokic, Ivan D.
    Jokic, Stevan D.
    Delic, Vlado D.
    Peric, Zoran H.
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (02): : 224 - 236
  • [46] Modified Mel-Frequency cepstral coefficient
    Saha, G
    Yadhunandan, US
    [J]. Proceedings of the Sixth IASTED International Conference on Signal and Image Processing, 2004, : 215 - 219
  • [47] Faults detection using Gaussian mixture models, mel-frequency cepstral coefficients and kurtosis
    Nelwamondo, Fulufhelo V.
    Marwala, Tshilidzi
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 290 - 295
  • [48] Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition
    Ezeiza, Aitzol
    Lopez de Ipina, Karmele
    Hernandez, Carmen
    Barroso, Nora
    [J]. ADVANCES IN NONLINEAR SPEECH PROCESSING, 2011, 7015 : 183 - +
  • [49] Improved DTW Speech Recognition Algorithm Based on the MEL Frequency Cepstral Coefficients
    Wei Ming-zhe
    Li Xi
    Ren Li-mian
    [J]. 12TH ANNUAL MEETING OF CHINA ASSOCIATION FOR SCIENCE AND TECHNOLOGY ON INFORMATION AND COMMUNICATION TECHNOLOGY AND SMART GRID, 2010, : 235 - 238
  • [50] Voice Control for a Gripper using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models
    Velasco-Hernandez, Gustavo
    Diaz-Toro, Andres
    [J]. 2015 20TH SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND COMPUTER VISION (STSIVA), 2015,