Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

被引:0
|
作者
Gondohanindijo, Jutono [1 ]
Muljono [1 ]
Noersasongko, Edi [1 ]
Pujiono [1 ]
Setiadi, De Rosal Moses [1 ]
机构
[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia
关键词
Deep learning; multi-features extraction; RAVDESS; speech emotion recognition; CLASSIFICATION;
D O I
10.14569/IJACSA.2023.0140623
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.
引用
收藏
页码:198 / 206
页数:9
相关论文
共 50 条
  • [1] Multi-features extraction based on deep learning for skin lesion classification
    Benyahia, Samia
    Meftah, Boudjelal
    Lezoray, Olivier
    TISSUE & CELL, 2022, 74
  • [2] Speech Emotion Recognition Using Deep Learning on audio recordings
    Suganya, S.
    Charles, E. Y. A.
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [3] Deep Learning and Audio Based Emotion Recognition
    Demir, Asli
    Atila, Orhan
    Sengur, Abdulkadir
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [4] Emotion recognition of audio/speech data using deep learning approaches
    Gupta, Vedika
    Juyal, Stuti
    Singh, Gurvinder Pal
    Killa, Chirag
    Gupta, Nishant
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (06): : 1309 - 1317
  • [5] Visual -audio emotion recognition based on multi -task and ensemble learning with multiple features ?
    Hao, Man
    Cao, Wei-Hua
    Liu, Zhen-Tao
    Wu, Min
    Xiao, Peng
    NEUROCOMPUTING, 2020, 391 : 42 - 51
  • [6] Deep Learning-Based Speech Emotion Recognition Using Multi-Level Fusion of Concurrent Features
    Kakuba, Samuel
    Poulose, Alwin
    Han, Dong Seog
    IEEE ACCESS, 2022, 10 : 125538 - 125551
  • [7] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
    Li, Zhongjie
    Zhang, Gaoyan
    Dang, Jianwu
    Wang, Longbiao
    Wei, Jianguo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Deep Learning-Based Emotion Recognition by Fusion of Facial Expressions and Speech Features
    Vardhan, Jasthi Vivek
    Chakravarti, Yelavarti Kalyan
    Chand, Annam Jitin
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
  • [9] EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features
    Zhuang, Jujuan
    Gao, Wanquan
    Su, Rui
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2024, 22 (01)
  • [10] Music Emotion Recognition with the Extraction of Audio Features Using Machine Learning Approaches
    Juthi, Jannatul Humayra
    Gomes, Anthony
    Bhuiyan, Touhid
    Mahmud, Imran
    PROCEEDINGS OF ICETIT 2019: EMERGING TRENDS IN INFORMATION TECHNOLOGY, 2020, 605 : 318 - 329