Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model

被引：0

作者：

Swami Mishra

Nehal Bhatnagar

Prakasam P

Sureshkumar T. R

机构：

[1] Vellore Institute of Technology,School of Electronics Engineering

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Speech emotion recognition; Deep convolutional neural networks; LSTM; MFSC; Ensemble learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Accurate emotion detection from speech utterances has been a challenging and active research affair recently. Speech emotion recognition (SER) systems play an essential role in Human-machine interaction, virtual reality, emergency services, and many other real-time systems. It is an open-ended problem as subjects from different regions and lingual backgrounds convey emotions altogether differently. The conventional approach used low-level periodic features from audio samples like energy, pitch, etc., for classification but was not efficient enough to detect emotions accurately and not generalized. With the recent advancements in computer vision and neural networks extracting high-level features and more accurate recognition can be achieved. This study proposes an ensemble deep CNN + Bi-LSTM-based framework for speech emotion recognition and classification of seven different emotions. The paralinguistic log Mel-frequency spectral coefficients (MFSC) is used as a feature to train the proposed architecture. The proposed Hybrid model is validated with TESS and SAVEE datasets. Experimental results have indicated a classification accuracy of 96.36%. The proposed model is compared with existing models, proving the superiority of the proposed hybrid deep CNN and Bi-LSTM model.

引用

页码：37603 / 37620

页数：17

共 50 条

[1] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Mishra, Swami
Bhatnagar, Nehal
Prakasam, P.
Sureshkumar, T. R.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
[2] Speech Emotion Recognition Using CNN
Huang, Zhengwei
Dong, Ming
Mao, Qirong
Zhan, Yongzhao
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
[3] EEG-based emotion recognition using hybrid CNN and LSTM classification
Chakravarthi, Bhuvaneshwari
Ng, Sin-Chun
Ezilarasan, M. R.
Leung, Man-Fai
[J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2022, 16
[4] A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech
Kim, Sera
Lee, Seok-Pil
[J]. ELECTRONICS, 2023, 12 (19)
[5] A HYBRID CNN-BILSTM MODEL FOR DRUG NAMED ENTITY RECOGNITION
Fudholi, Dhomas Hatta
Nayoan, Royan Abida N.
Hidayatullah, Ahmad Fathan
Arianto, Dede Brahma
[J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (01): : 730 - 744
[6] Hybrid Time Distributed CNN-transformer for Speech Emotion Recognition
Slimi, Anwer
Nicolas, Henri
Zrigui, Mounir
[J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2022, : 602 - 611
[7] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
Khan, Waleed Akram
ul Qudous, Hamad
Farhan, Asma Ahmad
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
[8] Speech emotion recognition by using complex MFCC and deep sequential model
Suprava Patnaik
[J]. Multimedia Tools and Applications, 2023, 82 : 11897 - 11922
[9] Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM
Mustaqeem
Sajjad, Muhammad
Kwon, Soonil
[J]. IEEE ACCESS, 2020, 8 : 79861 - 79875
[10] Speech emotion recognition by using complex MFCC and deep sequential model
Patnaik, Suprava
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11897 - 11922

← 1 2 3 4 5 →