Speech Emotion Recognition using MFCC and Hybrid Neural Networks

被引:2
|
作者
Badr, Youakim [1 ]
Mukherjee, Partha [1 ]
Thumati, Sindhu [1 ]
机构
[1] Penn State Univ, Great Valley, PA 19355 USA
关键词
Hybrid Neural Network; Speech Emotion Recognition; MFCC; ConvLSTM; RAVDESS Data; CLASSIFICATION;
D O I
10.5220/0010707400003063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition is a challenging task and feature extraction plays an important role in effectively classifying speech into different emotions. In this paper, we apply traditional feature extraction methods like MFCC for feature extraction from audio files. Instead of using traditional machine learning approaches like SVM to classify audio files, we investigate different neural network architectures. Our baseline model implemented as a convolutional neural network results in 60% classification accuracy. We propose a hybrid neural network architecture based on Convolutional and Long Short-Term Memory (ConvLSTM) networks to capture spatial and sequential information of audio files. Our experimental results show that our ComvLSTM model has achieved an accuracy of 59%. We improved our model with data augmentation techniques and re-trained it with augmented dataset. The classification accuracy achieves 91% for multi-class classification of RAVDESS dataset outperforming the accuracy of state-of-the-art multi-class classification models that used the similar data.
引用
收藏
页码:366 / 373
页数:8
相关论文
共 50 条
  • [1] Emotion Recognition in Speech Using MFCC and Classifiers
    Ajitha, G.
    Prashanth, Addagatla
    Radhika, Chelle
    Chaitanya, Kancharapu
    [J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING ( ICCVBIC 2021), 2022, 1420 : 197 - 207
  • [2] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    [J]. AFFECTIVE MINDS, 2000, : 215 - 220
  • [3] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    [J]. NEURAL COMPUTING & APPLICATIONS, 2000, 9 (04): : 290 - 296
  • [4] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    [J]. Neural Computing & Applications, 2000, 9 : 290 - 296
  • [5] Speech Based Human Emotion Recognition Using MFCC
    Likitha, M. S.
    Gupta, Raksha R.
    Hasitha, K.
    Raju, A. Upendra
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 2257 - 2260
  • [6] Emotion Recognition in Speech Using MFCC and Wavelet Features
    Kishore, K. V. Krishna
    Satish, P. Krishna
    [J]. PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847
  • [7] Speech Emotion Recognition Using ANN on MFCC Features
    Dolka, Harshit
    Xavier, Arul V. M.
    Juliet, Sujitha
    [J]. ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 431 - 435
  • [8] Speech emotion recognition using spiking neural networks
    Buscicchio, Cosimo A.
    Gorecki, Przemyslaw
    Caponetti, Laura
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 38 - 46
  • [9] Speech Emotion Recognition using MFCC features and LSTM network
    Kumbhar, Harshawardhan S.
    Bhandari, Sheetal U.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [10] Development of Speech Emotion Recognition Algorithm using MFCC and Prosody
    Koo, Hyejin
    Jeong, Soycong
    Yoon, Sungjae
    Kim, Wonjong
    [J]. 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,