Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

被引:0
|
作者
Gondohanindijo, Jutono [1 ]
Muljono [1 ]
Noersasongko, Edi [1 ]
Pujiono [1 ]
Setiadi, De Rosal Moses [1 ]
机构
[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia
关键词
Deep learning; multi-features extraction; RAVDESS; speech emotion recognition; CLASSIFICATION;
D O I
10.14569/IJACSA.2023.0140623
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.
引用
收藏
页码:198 / 206
页数:9
相关论文
共 50 条
  • [41] Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition
    Farhoudi, Zeinab
    Setayeshi, Saeed
    SPEECH COMMUNICATION, 2021, 127 : 92 - 103
  • [42] AN OPTIMIZED RESIDUAL NETWORK FOR EMOTION RECOGNITION BASED ON A MULTI-FEATURES FUSION TECHNOLOGY AND ELECTROENCEPHALOGRAPHY SIGNALS
    Jin, Enman
    Wang, Qinyong
    Chen, Peiyu
    Pan, Lu
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2024, 24 (02)
  • [43] Deep Learning Based Human Emotion Recognition from Speech Signal
    Queen, M. P. Flower
    Sankar, S. Perumal
    Aurtherson, P. Babu
    Jeyakumar, P.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (06): : 119 - 124
  • [44] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
    Byun, Sung-woo
    Shin, Bo-ra
    Lee, Seok-Pil
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
  • [45] Hybrid deep learning models based emotion recognition with speech signals
    Chowdary, M. Kalpana
    Priya, E. Anu
    Danciulescu, Daniela
    Anitha, J.
    Hemanth, D. Jude
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2023, 17 (04): : 1435 - 1453
  • [46] Speech Emotion Recognition Based on Deep Learning and Kernel Nonlinear PSVM
    Han Zhiyan
    Wang Jian
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 1426 - 1430
  • [47] Finger Vein Recognition Based on Multi-Features Fusion
    Titrek, Fatih
    Baykan, oemer K.
    TRAITEMENT DU SIGNAL, 2023, 40 (01) : 101 - 113
  • [48] Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features
    Wu, Yuezhou
    Zhang, Siling
    Li, Pengfei
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [49] Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
    Aggarwal, Apeksha
    Srivastava, Akshat
    Agarwal, Ajay
    Chahal, Nidhi
    Singh, Dilbag
    Alnuaim, Abeer Ali
    Alhadlaq, Aseel
    Lee, Heung-No
    SENSORS, 2022, 22 (06)
  • [50] Optimizing Speech Emotion Recognition with Machine Learning Based Advanced Audio Cue Analysis
    Pallewela, Nuwan
    Alahakoon, Damminda
    Adikari, Achini
    Pierce, John E.
    Rose, Miranda L.
    TECHNOLOGIES, 2024, 12 (07)