XEmoAccent: Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning

被引：0

作者：

Ahmad, Raheel ^{[1
]}

Iqbal, Arshad ^{[1
]}

Jadoon, Muhammad Mohsin ^{[1
]}

Ahmad, Naveed ^{[2
]}

Javed, Yasir ^{[2
]}

机构：

[1] Pak Austria Fachhsch Inst Appl Sci & Technol PAF I, Sino Pak Ctr Artificial Intelligence SPCAI, Mang 22620, Haripur, Pakistan

[2] Prince Sultan Univ, Dept Comp Sci, Riyadh 11586, Saudi Arabia

来源：

IEEE ACCESS | 2024年 / 12卷 / 41125-41142期

关键词：

deep learning; speech emotion recognition (SER); random forest (RF); logistic regression (LR); decision tree (DT); support vector machines (SVM); K-nearest neighbors (KNN); 1-dimensional convolutional neural networks (1D-CNN); Machine learning; FEATURES;

D O I：

10.1109/ACCESS.2024.3376379

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech is a powerful means to expressing thoughts, emotions, and perspectives. However, accurately determining the emotions conveyed through speech remains a challenging task. Existing manual methods for analyzing speech to recognize emotions are prone to errors, limiting our understanding and response to individuals' emotional states. To address diverse accents, an automated system capable of real-time emotion prediction from human speech is needed. This paper introduces a speech emotion recognition (SER) system that leverages supervised learning techniques to tackle cross-accent diversity. Distinctively, the system extracts a comprehensive set of nine speech features-Zero Crossing Rate, Mel Spectrum, Pitch, Root Mean Square values, Mel Frequency Cepstral Coefficients, chroma-stft, and three spectral features (Centroid, Contrast, and Roll-off) for refined speech signal processing and recognition. Seven machine learning models are employed, encompassing Random Forest, Logistic Regression, Decision Tree, Support Vector Machines, Gaussian Naive Bayes, K-Nearest Neighbors, ensemble learning, and four individual, hybrid deep learning models including Long short-term memory (LSTM) and 1-Dimensional Convolutional Neural Network (1D-CNN) with stratified cross-validation. Audio samples from diverse English regions are combined to train the models. The performance evaluation results of conventional machine learning and deep learning models indicate that the Random Forest-based feature selection model achieves the highest accuracy of up to 76% among the conventional machine learning models. Simultaneously, the 1D-CNN model with stratified cross-validation reaches up to 99% accuracy. The proposed framework enhances the cross-accent emotion recognition accuracy up to 86.3%, 89.87%, 90.27%, and 84.96% by margins of 14.71%, 10.15%, 9.6%, and 16.52% respectively.

引用

页码：41125 / 41142

页数：18

共 50 条

[21] Improvement of emotion recognition from facial images using deep learning and early stopping cross validation
Mohamed Bentoumi
Mohamed Daoud
Mohamed Benaouali
Abdelmalik Taleb Ahmed
Multimedia Tools and Applications, 2022, 81 : 29887 - 29917
[22] Improvement of emotion recognition from facial images using deep learning and early stopping cross validation
Bentoumi, Mohamed
Daoud, Mohamed
Benaouali, Mohamed
Ahmed, Abdelmalik Taleb
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (21) : 29887 - 29917
[23] Deep cross-domain transfer for emotion recognition via joint learning
Nguyen, Dung
Nguyen, Duc Thanh
Sridharan, Sridha
Abdelrazek, Mohamed
Denman, Simon
Tran, Son N.
Zeng, Rui
Fookes, Clinton
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 22455 - 22472
[24] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
Parry, Jack
Palaz, Dimitri
Clarke, Georgia
Lecomte, Pauline
Mead, Rebecca
Berger, Michael
Hofer, Gregor
INTERSPEECH 2019, 2019, : 1656 - 1660
[25] Deep cross-domain transfer for emotion recognition via joint learning
Dung Nguyen
Duc Thanh Nguyen
Sridha Sridharan
Mohamed Abdelrazek
Simon Denman
Son N. Tran
Rui Zeng
Clinton Fookes
Multimedia Tools and Applications, 2024, 83 : 22455 - 22472
[26] Cross-Subject Emotion Recognition Using Deep Adaptation Networks
Li, He
Jin, Yi-Ming
Zheng, Wei-Long
Lu, Bao-Liang
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 403 - 413
[27] Pattern recognition and features selection for speech emotion recognition model using deep learning
Jermsittiparsert, Kittisak
Abdurrahman, Abdurrahman
Siriattakul, Parinya
Sundeeva, Ludmila A.
Hashim, Wahidah
Rahim, Robbi
Maseleno, Andino
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 799 - 806
[28] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
Byun, Sung-woo
Shin, Bo-ra
Lee, Seok-Pil
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
[29] An emotion recognition embedded system using a lightweight deep learning model
Bazargani, Mehdi
Tahmasebi, Amir
Yazdchi, Mohammadreza
Baharlouei, Zahra
JOURNAL OF MEDICAL SIGNALS & SENSORS, 2023, 13 (04): : 272 - 279
[30] A Framework for Driver Emotion Recognition using Deep Learning and Grassmann Manifolds
Verma, Bindu
Choudhary, Ayesha
2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2018, : 1421 - 1426

← 1 2 3 4 5 →