XEmoAccent: Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning

被引:0
|
作者
Ahmad, Raheel [1 ]
Iqbal, Arshad [1 ]
Jadoon, Muhammad Mohsin [1 ]
Ahmad, Naveed [2 ]
Javed, Yasir [2 ]
机构
[1] Pak Austria Fachhsch Inst Appl Sci & Technol PAF I, Sino Pak Ctr Artificial Intelligence SPCAI, Mang 22620, Haripur, Pakistan
[2] Prince Sultan Univ, Dept Comp Sci, Riyadh 11586, Saudi Arabia
来源
IEEE ACCESS | 2024年 / 12卷 / 41125-41142期
关键词
deep learning; speech emotion recognition (SER); random forest (RF); logistic regression (LR); decision tree (DT); support vector machines (SVM); K-nearest neighbors (KNN); 1-dimensional convolutional neural networks (1D-CNN); Machine learning; FEATURES;
D O I
10.1109/ACCESS.2024.3376379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a powerful means to expressing thoughts, emotions, and perspectives. However, accurately determining the emotions conveyed through speech remains a challenging task. Existing manual methods for analyzing speech to recognize emotions are prone to errors, limiting our understanding and response to individuals' emotional states. To address diverse accents, an automated system capable of real-time emotion prediction from human speech is needed. This paper introduces a speech emotion recognition (SER) system that leverages supervised learning techniques to tackle cross-accent diversity. Distinctively, the system extracts a comprehensive set of nine speech features-Zero Crossing Rate, Mel Spectrum, Pitch, Root Mean Square values, Mel Frequency Cepstral Coefficients, chroma-stft, and three spectral features (Centroid, Contrast, and Roll-off) for refined speech signal processing and recognition. Seven machine learning models are employed, encompassing Random Forest, Logistic Regression, Decision Tree, Support Vector Machines, Gaussian Naive Bayes, K-Nearest Neighbors, ensemble learning, and four individual, hybrid deep learning models including Long short-term memory (LSTM) and 1-Dimensional Convolutional Neural Network (1D-CNN) with stratified cross-validation. Audio samples from diverse English regions are combined to train the models. The performance evaluation results of conventional machine learning and deep learning models indicate that the Random Forest-based feature selection model achieves the highest accuracy of up to 76% among the conventional machine learning models. Simultaneously, the 1D-CNN model with stratified cross-validation reaches up to 99% accuracy. The proposed framework enhances the cross-accent emotion recognition accuracy up to 86.3%, 89.87%, 90.27%, and 84.96% by margins of 14.71%, 10.15%, 9.6%, and 16.52% respectively.
引用
收藏
页码:41125 / 41142
页数:18
相关论文
共 50 条
  • [31] Speech Emotion Recognition Using Deep Learning LSTM for Tamil Language
    Fernandes, Bennilo
    Mannepalli, Kasiprasad
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (03): : 1915 - 1936
  • [32] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
    Satt, Aharon
    Rozenberg, Shai
    Hoory, Ron
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
  • [33] Emotion Recognition from Facial Expression using Explainable Deep Learning
    Cesarelli, Mario
    Martinelli, Fabio
    Mercaldo, Francesco
    Santone, Antonella
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 306 - 311
  • [34] Emotion recognition in EEG signals using deep learning methods: A review
    Jafari, Mahboobeh
    Shoeibi, Afshin
    Khodatars, Marjane
    Bagherzadeh, Sara
    Shalbaf, Ahmad
    Garcia, David Lopez
    Gorriz, Juan M.
    Acharya, U. Rajendra
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 165
  • [35] Text-Based Emotion Recognition Using Deep Learning Approach
    Bharti, Santosh Kumar
    Varadhaganapathy, S.
    Gupta, Rajeev Kumar
    Shukla, Prashant Kumar
    Bouye, Mohamed
    Hingaa, Simon Karanja
    Mahmoud, Amena
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [36] Text-Based Emotion Recognition Using Deep Learning Approach
    Bharti, Santosh Kumar
    Varadhaganapathy, S.
    Gupta, Rajeev Kumar
    Shukla, Prashant Kumar
    Bouye, Mohamed
    Hingaa, Simon Karanja
    Mahmoud, Amena
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [37] Emotion Recognition Based On Electroencephalogram Signals Using Deep Learning Network
    Wu, Bin
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2023, 27 (01): : 1967 - 1974
  • [38] Thermography for Emotion Recognition Using Deep Learning in Academic Settings: A Review
    Fardian, Fardian
    Mawarpury, Marty
    Munadi, Khairul
    Arnia, Fitri
    IEEE Access, 2022, 10 : 96476 - 96491
  • [39] EEG-Based Human Emotion Recognition Using Deep Learning
    1600, Institute of Electrical and Electronics Engineers Inc.
  • [40] Real-Time Emotion Recognition Using Deep Learning Algorithms
    El Mettiti, Abderrahmane
    Oumsis, Mohammed
    Chehri, Abdellah
    Saadane, Rachid
    2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,