XEmoAccent: Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning

被引:0
|
作者
Ahmad, Raheel [1 ]
Iqbal, Arshad [1 ]
Jadoon, Muhammad Mohsin [1 ]
Ahmad, Naveed [2 ]
Javed, Yasir [2 ]
机构
[1] Pak Austria Fachhsch Inst Appl Sci & Technol PAF I, Sino Pak Ctr Artificial Intelligence SPCAI, Mang 22620, Haripur, Pakistan
[2] Prince Sultan Univ, Dept Comp Sci, Riyadh 11586, Saudi Arabia
来源
IEEE ACCESS | 2024年 / 12卷 / 41125-41142期
关键词
deep learning; speech emotion recognition (SER); random forest (RF); logistic regression (LR); decision tree (DT); support vector machines (SVM); K-nearest neighbors (KNN); 1-dimensional convolutional neural networks (1D-CNN); Machine learning; FEATURES;
D O I
10.1109/ACCESS.2024.3376379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a powerful means to expressing thoughts, emotions, and perspectives. However, accurately determining the emotions conveyed through speech remains a challenging task. Existing manual methods for analyzing speech to recognize emotions are prone to errors, limiting our understanding and response to individuals' emotional states. To address diverse accents, an automated system capable of real-time emotion prediction from human speech is needed. This paper introduces a speech emotion recognition (SER) system that leverages supervised learning techniques to tackle cross-accent diversity. Distinctively, the system extracts a comprehensive set of nine speech features-Zero Crossing Rate, Mel Spectrum, Pitch, Root Mean Square values, Mel Frequency Cepstral Coefficients, chroma-stft, and three spectral features (Centroid, Contrast, and Roll-off) for refined speech signal processing and recognition. Seven machine learning models are employed, encompassing Random Forest, Logistic Regression, Decision Tree, Support Vector Machines, Gaussian Naive Bayes, K-Nearest Neighbors, ensemble learning, and four individual, hybrid deep learning models including Long short-term memory (LSTM) and 1-Dimensional Convolutional Neural Network (1D-CNN) with stratified cross-validation. Audio samples from diverse English regions are combined to train the models. The performance evaluation results of conventional machine learning and deep learning models indicate that the Random Forest-based feature selection model achieves the highest accuracy of up to 76% among the conventional machine learning models. Simultaneously, the 1D-CNN model with stratified cross-validation reaches up to 99% accuracy. The proposed framework enhances the cross-accent emotion recognition accuracy up to 86.3%, 89.87%, 90.27%, and 84.96% by margins of 14.71%, 10.15%, 9.6%, and 16.52% respectively.
引用
收藏
页码:41125 / 41142
页数:18
相关论文
共 50 条
  • [21] Improvement of emotion recognition from facial images using deep learning and early stopping cross validation
    Mohamed Bentoumi
    Mohamed Daoud
    Mohamed Benaouali
    Abdelmalik Taleb Ahmed
    Multimedia Tools and Applications, 2022, 81 : 29887 - 29917
  • [22] Improvement of emotion recognition from facial images using deep learning and early stopping cross validation
    Bentoumi, Mohamed
    Daoud, Mohamed
    Benaouali, Mohamed
    Ahmed, Abdelmalik Taleb
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (21) : 29887 - 29917
  • [23] Deep cross-domain transfer for emotion recognition via joint learning
    Nguyen, Dung
    Nguyen, Duc Thanh
    Sridharan, Sridha
    Abdelrazek, Mohamed
    Denman, Simon
    Tran, Son N.
    Zeng, Rui
    Fookes, Clinton
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 22455 - 22472
  • [24] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
    Parry, Jack
    Palaz, Dimitri
    Clarke, Georgia
    Lecomte, Pauline
    Mead, Rebecca
    Berger, Michael
    Hofer, Gregor
    INTERSPEECH 2019, 2019, : 1656 - 1660
  • [25] Deep cross-domain transfer for emotion recognition via joint learning
    Dung Nguyen
    Duc Thanh Nguyen
    Sridha Sridharan
    Mohamed Abdelrazek
    Simon Denman
    Son N. Tran
    Rui Zeng
    Clinton Fookes
    Multimedia Tools and Applications, 2024, 83 : 22455 - 22472
  • [26] Cross-Subject Emotion Recognition Using Deep Adaptation Networks
    Li, He
    Jin, Yi-Ming
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 403 - 413
  • [27] Pattern recognition and features selection for speech emotion recognition model using deep learning
    Jermsittiparsert, Kittisak
    Abdurrahman, Abdurrahman
    Siriattakul, Parinya
    Sundeeva, Ludmila A.
    Hashim, Wahidah
    Rahim, Robbi
    Maseleno, Andino
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 799 - 806
  • [28] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
    Byun, Sung-woo
    Shin, Bo-ra
    Lee, Seok-Pil
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
  • [29] An emotion recognition embedded system using a lightweight deep learning model
    Bazargani, Mehdi
    Tahmasebi, Amir
    Yazdchi, Mohammadreza
    Baharlouei, Zahra
    JOURNAL OF MEDICAL SIGNALS & SENSORS, 2023, 13 (04): : 272 - 279
  • [30] A Framework for Driver Emotion Recognition using Deep Learning and Grassmann Manifolds
    Verma, Bindu
    Choudhary, Ayesha
    2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2018, : 1421 - 1426