Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

被引:0
|
作者
Gondohanindijo, Jutono [1 ]
Muljono [1 ]
Noersasongko, Edi [1 ]
Pujiono [1 ]
Setiadi, De Rosal Moses [1 ]
机构
[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia
关键词
Deep learning; multi-features extraction; RAVDESS; speech emotion recognition; CLASSIFICATION;
D O I
10.14569/IJACSA.2023.0140623
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.
引用
收藏
页码:198 / 206
页数:9
相关论文
共 50 条
  • [31] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [32] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
    Cai, Linqin
    Dong, Jiangong
    Wei, Min
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
  • [33] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
    Liu, Dong
    Wang, Zhiyong
    Wang, Lifeng
    Chen, Longxi
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [34] Emotion Recognition in Speech with Deep Learning Architectures
    Erdal, Mehmet
    Kaechele, Markus
    Schwenker, Friedhelm
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
  • [35] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [36] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [37] Multi-type features separating fusion learning for Speech Emotion Recognition
    Xu, Xinlei
    Li, Dongdong
    Zhou, Yijun
    Wang, Zhe
    APPLIED SOFT COMPUTING, 2022, 130
  • [38] Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment
    Ivleva, Natalja
    Pentel, Avar
    Dunajeva, Olga
    Justsenko, Valeria
    TOWARDS A HYBRID, FLEXIBLE AND SOCIALLY ENGAGED HIGHER EDUCATION, VOL 1, ICL 2023, 2024, 899 : 420 - 431
  • [39] Deep temporal clustering features for speech emotion recognition
    Lin, Wei-Cheng
    Busso, Carlos
    SPEECH COMMUNICATION, 2024, 157
  • [40] Speech Emotion Recognition Based on Transfer Emotion-Discriminative Features Subspace Learning
    Zhang, Kexin
    Liu, Yunxiang
    IEEE ACCESS, 2023, 11 : 56336 - 56343