XEmoAccent: Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning

被引:0
|
作者
Ahmad, Raheel [1 ]
Iqbal, Arshad [1 ]
Jadoon, Muhammad Mohsin [1 ]
Ahmad, Naveed [2 ]
Javed, Yasir [2 ]
机构
[1] Pak Austria Fachhsch Inst Appl Sci & Technol PAF I, Sino Pak Ctr Artificial Intelligence SPCAI, Mang 22620, Haripur, Pakistan
[2] Prince Sultan Univ, Dept Comp Sci, Riyadh 11586, Saudi Arabia
来源
IEEE ACCESS | 2024年 / 12卷 / 41125-41142期
关键词
deep learning; speech emotion recognition (SER); random forest (RF); logistic regression (LR); decision tree (DT); support vector machines (SVM); K-nearest neighbors (KNN); 1-dimensional convolutional neural networks (1D-CNN); Machine learning; FEATURES;
D O I
10.1109/ACCESS.2024.3376379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a powerful means to expressing thoughts, emotions, and perspectives. However, accurately determining the emotions conveyed through speech remains a challenging task. Existing manual methods for analyzing speech to recognize emotions are prone to errors, limiting our understanding and response to individuals' emotional states. To address diverse accents, an automated system capable of real-time emotion prediction from human speech is needed. This paper introduces a speech emotion recognition (SER) system that leverages supervised learning techniques to tackle cross-accent diversity. Distinctively, the system extracts a comprehensive set of nine speech features-Zero Crossing Rate, Mel Spectrum, Pitch, Root Mean Square values, Mel Frequency Cepstral Coefficients, chroma-stft, and three spectral features (Centroid, Contrast, and Roll-off) for refined speech signal processing and recognition. Seven machine learning models are employed, encompassing Random Forest, Logistic Regression, Decision Tree, Support Vector Machines, Gaussian Naive Bayes, K-Nearest Neighbors, ensemble learning, and four individual, hybrid deep learning models including Long short-term memory (LSTM) and 1-Dimensional Convolutional Neural Network (1D-CNN) with stratified cross-validation. Audio samples from diverse English regions are combined to train the models. The performance evaluation results of conventional machine learning and deep learning models indicate that the Random Forest-based feature selection model achieves the highest accuracy of up to 76% among the conventional machine learning models. Simultaneously, the 1D-CNN model with stratified cross-validation reaches up to 99% accuracy. The proposed framework enhances the cross-accent emotion recognition accuracy up to 86.3%, 89.87%, 90.27%, and 84.96% by margins of 14.71%, 10.15%, 9.6%, and 16.52% respectively.
引用
收藏
页码:41125 / 41142
页数:18
相关论文
共 50 条
  • [1] EMOTION RECOGNITION USING DEEP LEARNING
    Priya, R. N. Beena
    Hanmandlu, M.
    Vasikarla, Shantaram
    2021 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2021,
  • [2] A Cross-Culture Study on Multimodal Emotion Recognition Using Deep Learning
    Gan, Lu
    Liu, Wei
    Luo, Yun
    Wu, Xun
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 670 - 680
  • [3] Emotion Recognition Using Multimodal Deep Learning
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 521 - 529
  • [4] Spoken Emotion Recognition Using Deep Learning
    Albornoz, E. M.
    Sanchez-Gutierrez, M.
    Martinez-Licona, F.
    Rufiner, H. L.
    Goddard, J.
    PROGRESS IN PATTERN RECOGNITION IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2014, 2014, 8827 : 104 - 111
  • [5] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [6] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [7] Multimodal Emotion Recognition using Deep Learning Architectures
    Ranganathan, Hiranmayi
    Chakraborty, Shayok
    Panchanathan, Sethuraman
    2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
  • [8] Multimodal Arabic emotion recognition using deep learning
    Al Roken, Noora
    Barlas, Gerassimos
    SPEECH COMMUNICATION, 2023, 155
  • [9] Transfer Accent Identification Learning for Enhancing Speech Emotion Recognition
    Dharshini, G. Priya
    Rao, K. Sreenivasa
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (08) : 5090 - 5120
  • [10] Deep Learning for Emotion Recognition on Small Datasets Using Transfer Learning
    Hong-Wei Ng
    Viet Dung Nguyen
    Vonikakis, Vassilios
    Winkler, Stefan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 443 - 449