Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN

被引:36
|
作者
Kumaran, U. [1 ]
Radha Rammohan, S. [2 ]
Nagarajan, Senthil Murugan [3 ]
Prathik, A. [4 ]
机构
[1] Amrita Vishwa Vidyapeetham, Sch Engn, Bengaluru, India
[2] Dr MGR Educ & Res Inst, Dept Comp Applicat, Chennai, Tamil Nadu, India
[3] Chandigarh Univ, Dept Comp Sci & Engn, AIT, Chandigarh, Punjab, India
[4] Vel Tech Rangarajan Dr Sagunthala R&D Inst Sci &, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Emotion recognition; Speech signals; MFCC; GFCC; DL; CNN; RNN; SPECTRAL FEATURES; TWITTER DATA; CLASSIFICATION;
D O I
10.1007/s10772-020-09792-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Emotions play a significant role in human life. Recognition of human emotions has numerous tasks in recognizing the emotional features of speech signals. In this regard, Speech Emotion Recognition (SER) has multiple applications in various fields of education, health, forensics, defense, robotics, and scientific purposes. However, SER has the limitations of data labeling, misinterpretation of speech, annotation of audio, and time complexity. This work presents the evaluation of SER based on the features extracted from Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) to study the emotions from different versions of audio signals. The sound signals are segmented by extracting and parametrizing each frequency calls using MFCC, GFCC, and combined features (M-GFCC) in the feature extraction stage. With the recent advances in Deep Learning techniques, this paper proposes a Deep Convolutional-Recurrent Neural Network (Deep C-RNN) approach to classify the effectiveness of learning emotion variations in the classification stage. We use a fusion of Mel-Gammatone filter in convolutional layers to first extract high-level spectral features then recurrent layers is adopted to learn the long-term temporal context from high-level features. Also, the proposed work differentiates the emotions from neutral speech with suitable binary tree diagrammatic illustrations. The methodology of the proposed work is applied on a large dataset covering Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. Finally, the proposed results which obtained accuracy more than 80% and have less loss are compared with the state of the art approaches, and an experimental result provides evidence that fusion results outperform in recognizing emotions from speech signals.
引用
收藏
页码:303 / 314
页数:12
相关论文
共 50 条
  • [1] Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN
    U. Kumaran
    S. Radha Rammohan
    Senthil Murugan Nagarajan
    A. Prathik
    International Journal of Speech Technology, 2021, 24 : 303 - 314
  • [2] Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN
    Kumaran, U.
    Radha Rammohan, S.
    Nagarajan, Senthil Murugan
    Prathik, A.
    International Journal of Speech Technology, 2021, 24 (02): : 303 - 314
  • [3] Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
    Sharan, Roneel, V
    2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 139 - 142
  • [4] Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients
    Palo, Hemanta Kumar
    Chandra, Mahesh
    Mohanty, Mihir Narayan
    ADVANCES IN SYSTEMS, CONTROL AND AUTOMATION, 2018, 442 : 491 - 498
  • [5] Emotion Recognition from Speech Signal Using Mel-Frequency Cepstral Coefficients
    Korkmaz, Onur Erdem
    Atasoy, Ayten
    2015 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2015, : 1254 - 1257
  • [6] Speech Emotion Recognition using Mel Frequency Cepstral Coefficient and SVM Classifier
    Fernandes, V.
    Mascarehnas, L.
    Mendonca, C.
    Johnson, A.
    Mishra, R.
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 200 - 204
  • [7] Chip design of mel frequency cepstral coefficients for speech recognition
    Wang, JC
    Wang, JF
    Weng, YS
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3658 - 3661
  • [8] Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition
    Adiga, Aniruddha
    Magimai-Doss, Mathew
    Seelamantula, Chandra Sekhar
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [9] Breathing site classification via joint mel frequency cepstral coefficients and gammatone frequency cepstral coefficients approach
    Zhang, Jiarui
    Ling, Bingo Wing-Kuen
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 3623 - 3634
  • [10] Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition
    Hora, Baveet Singh
    Uthiraa, S.
    Patil, Hemant A.
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 116 - 129