Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model

被引:0
|
作者
Swami Mishra
Nehal Bhatnagar
Prakasam P
Sureshkumar T. R
机构
[1] Vellore Institute of Technology,School of Electronics Engineering
来源
关键词
Speech emotion recognition; Deep convolutional neural networks; LSTM; MFSC; Ensemble learning;
D O I
暂无
中图分类号
学科分类号
摘要
Accurate emotion detection from speech utterances has been a challenging and active research affair recently. Speech emotion recognition (SER) systems play an essential role in Human-machine interaction, virtual reality, emergency services, and many other real-time systems. It is an open-ended problem as subjects from different regions and lingual backgrounds convey emotions altogether differently. The conventional approach used low-level periodic features from audio samples like energy, pitch, etc., for classification but was not efficient enough to detect emotions accurately and not generalized. With the recent advancements in computer vision and neural networks extracting high-level features and more accurate recognition can be achieved. This study proposes an ensemble deep CNN + Bi-LSTM-based framework for speech emotion recognition and classification of seven different emotions. The paralinguistic log Mel-frequency spectral coefficients (MFSC) is used as a feature to train the proposed architecture. The proposed Hybrid model is validated with TESS and SAVEE datasets. Experimental results have indicated a classification accuracy of 96.36%. The proposed model is compared with existing models, proving the superiority of the proposed hybrid deep CNN and Bi-LSTM model.
引用
收藏
页码:37603 / 37620
页数:17
相关论文
共 50 条
  • [1] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [2] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [3] EEG-based emotion recognition using hybrid CNN and LSTM classification
    Chakravarthi, Bhuvaneshwari
    Ng, Sin-Chun
    Ezilarasan, M. R.
    Leung, Man-Fai
    [J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2022, 16
  • [4] A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech
    Kim, Sera
    Lee, Seok-Pil
    [J]. ELECTRONICS, 2023, 12 (19)
  • [5] A HYBRID CNN-BILSTM MODEL FOR DRUG NAMED ENTITY RECOGNITION
    Fudholi, Dhomas Hatta
    Nayoan, Royan Abida N.
    Hidayatullah, Ahmad Fathan
    Arianto, Dede Brahma
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (01): : 730 - 744
  • [6] Hybrid Time Distributed CNN-transformer for Speech Emotion Recognition
    Slimi, Anwer
    Nicolas, Henri
    Zrigui, Mounir
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2022, : 602 - 611
  • [7] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
    Khan, Waleed Akram
    ul Qudous, Hamad
    Farhan, Asma Ahmad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
  • [8] Speech emotion recognition by using complex MFCC and deep sequential model
    Suprava Patnaik
    [J]. Multimedia Tools and Applications, 2023, 82 : 11897 - 11922
  • [9] Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM
    Mustaqeem
    Sajjad, Muhammad
    Kwon, Soonil
    [J]. IEEE ACCESS, 2020, 8 : 79861 - 79875
  • [10] Speech emotion recognition by using complex MFCC and deep sequential model
    Patnaik, Suprava
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11897 - 11922