Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

被引:0
|
作者
Gondohanindijo, Jutono [1 ]
Muljono [1 ]
Noersasongko, Edi [1 ]
Pujiono [1 ]
Setiadi, De Rosal Moses [1 ]
机构
[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia
关键词
Deep learning; multi-features extraction; RAVDESS; speech emotion recognition; CLASSIFICATION;
D O I
10.14569/IJACSA.2023.0140623
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.
引用
收藏
页码:198 / 206
页数:9
相关论文
共 50 条
  • [21] Deep Learning Based Emotion Recognition from Chinese Speech
    Zhang, Weishan
    Zhao, Dehai
    Chen, Xiufeng
    Zhang, Yuanjie
    INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 49 - 58
  • [22] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [23] Facial Expression Recognition Based on Multi-Features Cooperative Deep Convolutional Network
    Wu, Haopeng
    Lu, Zhiying
    Zhang, Jianfeng
    Li, Xin
    Zhao, Mingyue
    Ding, Xudong
    APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 14
  • [24] Multi-Features Fusion Based Face Recognition
    Long, Xianzhong
    Chen, Songcan
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT VI, 2017, 10639 : 540 - 549
  • [25] Learning Transferable Features for Speech Emotion Recognition
    Marczewski, Alison
    Veloso, Adriano
    Ziviani, Nivio
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 529 - 536
  • [26] Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
    Sharan, Roneel, V
    2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 139 - 142
  • [27] Speech Emotion Recognition Based on Two-Stream Deep Learning Model Using Korean Audio Information
    Jo, A-Hyeon
    Kwak, Keun-Chang
    APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [28] Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features
    Hao M.
    Cao W.-H.
    Liu Z.-T.
    Wu M.
    Xiao P.
    Cao, Wei-Hua (weihuacao@cug.edu.cn), 1600, Elsevier B.V., Netherlands (391): : 42 - 51
  • [29] Combine Multi-features with Deep Learning for Answer Selection
    Zheng, Yuqing
    Zhang, Chenghe
    Zheng, Dequan
    Yu, Feng
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 91 - 94
  • [30] Learning Affective Features With a Hybrid Deep Model for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    Tian, Qi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3030 - 3043