XEmoAccent: Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning

被引:0
|
作者
Ahmad, Raheel [1 ]
Iqbal, Arshad [1 ]
Jadoon, Muhammad Mohsin [1 ]
Ahmad, Naveed [2 ]
Javed, Yasir [2 ]
机构
[1] Pak Austria Fachhsch Inst Appl Sci & Technol PAF I, Sino Pak Ctr Artificial Intelligence SPCAI, Mang 22620, Haripur, Pakistan
[2] Prince Sultan Univ, Dept Comp Sci, Riyadh 11586, Saudi Arabia
来源
IEEE ACCESS | 2024年 / 12卷 / 41125-41142期
关键词
deep learning; speech emotion recognition (SER); random forest (RF); logistic regression (LR); decision tree (DT); support vector machines (SVM); K-nearest neighbors (KNN); 1-dimensional convolutional neural networks (1D-CNN); Machine learning; FEATURES;
D O I
10.1109/ACCESS.2024.3376379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a powerful means to expressing thoughts, emotions, and perspectives. However, accurately determining the emotions conveyed through speech remains a challenging task. Existing manual methods for analyzing speech to recognize emotions are prone to errors, limiting our understanding and response to individuals' emotional states. To address diverse accents, an automated system capable of real-time emotion prediction from human speech is needed. This paper introduces a speech emotion recognition (SER) system that leverages supervised learning techniques to tackle cross-accent diversity. Distinctively, the system extracts a comprehensive set of nine speech features-Zero Crossing Rate, Mel Spectrum, Pitch, Root Mean Square values, Mel Frequency Cepstral Coefficients, chroma-stft, and three spectral features (Centroid, Contrast, and Roll-off) for refined speech signal processing and recognition. Seven machine learning models are employed, encompassing Random Forest, Logistic Regression, Decision Tree, Support Vector Machines, Gaussian Naive Bayes, K-Nearest Neighbors, ensemble learning, and four individual, hybrid deep learning models including Long short-term memory (LSTM) and 1-Dimensional Convolutional Neural Network (1D-CNN) with stratified cross-validation. Audio samples from diverse English regions are combined to train the models. The performance evaluation results of conventional machine learning and deep learning models indicate that the Random Forest-based feature selection model achieves the highest accuracy of up to 76% among the conventional machine learning models. Simultaneously, the 1D-CNN model with stratified cross-validation reaches up to 99% accuracy. The proposed framework enhances the cross-accent emotion recognition accuracy up to 86.3%, 89.87%, 90.27%, and 84.96% by margins of 14.71%, 10.15%, 9.6%, and 16.52% respectively.
引用
收藏
页码:41125 / 41142
页数:18
相关论文
共 50 条
  • [41] Pattern recognition and features selection for speech emotion recognition model using deep learning
    Kittisak Jermsittiparsert
    Abdurrahman Abdurrahman
    Parinya Siriattakul
    Ludmila A. Sundeeva
    Wahidah Hashim
    Robbi Rahim
    Andino Maseleno
    International Journal of Speech Technology, 2020, 23 : 799 - 806
  • [42] Thermography for Emotion Recognition Using Deep Learning in Academic Settings: A Review
    Fardian, Fardian
    Mawarpury, Marty
    Munadi, Khairul
    Arnia, Fitri
    IEEE ACCESS, 2022, 10 : 96476 - 96491
  • [43] A Bilingual Emotion Recognition System Using Deep Learning Neural Networks
    Absa, Ahmed H. Abo
    Deriche, M.
    Mohandes, M.
    2018 15TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES (SSD), 2018, : 1241 - 1245
  • [44] Deep Representation Learning for Multimodal Emotion Recognition Using Physiological Signals
    Zubair, Muhammad
    Woo, Sungpil
    Lim, Sunhwan
    Yoon, Changwoo
    IEEE ACCESS, 2024, 12 : 106605 - 106617
  • [45] Emotion recognition of audio/speech data using deep learning approaches
    Gupta, Vedika
    Juyal, Stuti
    Singh, Gurvinder Pal
    Killa, Chirag
    Gupta, Nishant
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (06): : 1309 - 1317
  • [46] Emotion Recognition on Static Images Using Deep Transfer Learning and Ensembling
    Abanoz, Huseyin
    Cataltepe, Zehra
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [47] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [48] A Review on Facial Emotion Recognition of Subjects Using Deep Learning Techniques
    Kumar, P. Siva
    Nikhil, M.
    Rudresh, P. H. S.
    Kiran, K. Sai
    Charan, K.
    Prasad, K. S. N.
    2024 INTERNATIONAL CONFERENCE ON SOCIAL AND SUSTAINABLE INNOVATIONS IN TECHNOLOGY AND ENGINEERING, SASI-ITE 2024, 2024, : 48 - 53
  • [49] A Review of Advancements in Facial Emotion Recognition and Detection Using Deep Learning
    Harika, Rajana
    Uday, T.
    Sirisha, M. Lalitha
    Sahitya, M. Sri Lakshmi
    Drugaanjali, K.
    Srinivas, M. Satya
    Proceedings - 2024 International Conference on Social and Sustainable Innovations in Technology and Engineering, SASI-ITE 2024, 2024, : 290 - 295
  • [50] A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    Chmulik, Michal
    ELECTRONICS, 2021, 10 (10)