Emotions Classification from Speech with Deep Learning

被引:0
|
作者
Chowanda, Andry [1 ]
Muliono, Yohan [2 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
[2] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Cyber Secur Program, Jakarta 11480, Indonesia
关键词
Emotions recognition; speech modality; temporal information; affective system; NEURAL-NETWORK;
D O I
10.14569/IJACSA.2022.0130490
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Emotions are the essential parts that convey meaning to the interlocutors during social interactions. Hence, recognising emotions is paramount in building a good and natural affective system that can naturally interact with the human interlocutors. However, recognising emotions from social interactions require temporal information in order to classify the emotions correctly. This research aims to propose an architecture that extracts temporal information using the Temporal model of Convolutional Neural Network (CNN) and combined with the Long Short Term Memory (LSTM) architecture from the Speech modality. Several combinations and settings of the architectures were explored and presented in the paper. The results show that the best classifier achieved by the model trained with four layers of CNN combined with one layer of Bidirectional LSTM. Furthermore, the model was trained with an augmented training dataset with seven times more data than the original training dataset. The best model resulted in 94.25%, 57.07%, 0.2577 and 1.1678 for training accuracy, validation accuracy, training loss and validation loss, respectively. Moreover, Neutral (Calm) and Happy are the easiest classes to be recognised, while Angry is the hardest to be classified.
引用
收藏
页码:777 / 781
页数:5
相关论文
共 50 条
  • [11] Deep Learning for Acoustic Irony Classification in Spontaneous Speech
    Gent, Helen
    Adams, Chase
    Shih, Chilin
    Tang, Yan
    INTERSPEECH 2022, 2022, : 3993 - 3997
  • [12] Imagined Speech Classification Using EEG and Deep Learning
    Abdulghani, Mokhles M.
    Walters, Wilbur L.
    Abed, Khalid H.
    BIOENGINEERING-BASEL, 2023, 10 (06):
  • [13] Deep4SNet: deep learning for fake speech classification
    Ballesteros, M. Dora
    Rodriguez-Ortega, Yohanna
    Renza, Diego
    Arce, Gonzalo
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
  • [14] On Emotions as Features for Speech Overlaps Classification
    Egorow, Olga
    Wendemuth, Andreas
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (01) : 175 - 186
  • [15] Automatic Classification of Emotions in Spontaneous Speech
    Sztaho, David
    Imre, Viktor
    Vicsi, Klara
    ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 229 - 239
  • [16] An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech
    Shami, Mohammad
    Verhelst, Werner
    SPEECH COMMUNICATION, 2007, 49 (03) : 201 - 212
  • [17] Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech
    Sertolli, Benjamin
    Ren, Zhao
    Schuller, Björn W.
    Cummins, Nicholas
    Computer Speech and Language, 2021, 68
  • [18] Transfer learning based convolution neural net for authentication and classification of emotions from natural and stimulated speech signals
    Kumar, Mukul
    Katyal, Nipun
    Ruban, Nersisson
    Lyakso, Elena
    Mekala, A. Mary
    Raj, Alex Noel Joseph
    Richard, G. Maarc
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (01) : 2013 - 2024
  • [19] Random Deep Belief Networks for Recognizing Emotions from Speech Signals
    Wen, Guihua
    Li, Huihui
    Huang, Jubing
    Li, Danyang
    Xun, Eryang
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
  • [20] Emotions and their implications in speech, interactions and learning
    Pauzet, Anne
    Roch-Veiras, Sophie
    VOIX PLURIELLES, 2015, 12 (01): : 2 - 4