Emotions Classification from Speech with Deep Learning

被引：0

作者：

Chowanda, Andry ^{[1
]}

Muliono, Yohan ^{[2
]}

机构：

[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia

[2] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Cyber Secur Program, Jakarta 11480, Indonesia

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2022年 / 13卷 / 04期

关键词：

Emotions recognition; speech modality; temporal information; affective system; NEURAL-NETWORK;

D O I：

10.14569/IJACSA.2022.0130490

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Emotions are the essential parts that convey meaning to the interlocutors during social interactions. Hence, recognising emotions is paramount in building a good and natural affective system that can naturally interact with the human interlocutors. However, recognising emotions from social interactions require temporal information in order to classify the emotions correctly. This research aims to propose an architecture that extracts temporal information using the Temporal model of Convolutional Neural Network (CNN) and combined with the Long Short Term Memory (LSTM) architecture from the Speech modality. Several combinations and settings of the architectures were explored and presented in the paper. The results show that the best classifier achieved by the model trained with four layers of CNN combined with one layer of Bidirectional LSTM. Furthermore, the model was trained with an augmented training dataset with seven times more data than the original training dataset. The best model resulted in 94.25%, 57.07%, 0.2577 and 1.1678 for training accuracy, validation accuracy, training loss and validation loss, respectively. Moreover, Neutral (Calm) and Happy are the easiest classes to be recognised, while Angry is the hardest to be classified.

引用

页码：777 / 781

页数：5

共 50 条

[11] Deep Learning for Acoustic Irony Classification in Spontaneous Speech
Gent, Helen
Adams, Chase
Shih, Chilin
Tang, Yan
INTERSPEECH 2022, 2022, : 3993 - 3997
[12] Imagined Speech Classification Using EEG and Deep Learning
Abdulghani, Mokhles M.
Walters, Wilbur L.
Abed, Khalid H.
BIOENGINEERING-BASEL, 2023, 10 (06):
[13] Deep4SNet: deep learning for fake speech classification
Ballesteros, M. Dora
Rodriguez-Ortega, Yohanna
Renza, Diego
Arce, Gonzalo
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
[14] On Emotions as Features for Speech Overlaps Classification
Egorow, Olga
Wendemuth, Andreas
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (01) : 175 - 186
[15] Automatic Classification of Emotions in Spontaneous Speech
Sztaho, David
Imre, Viktor
Vicsi, Klara
ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 229 - 239
[16] An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech
Shami, Mohammad
Verhelst, Werner
SPEECH COMMUNICATION, 2007, 49 (03) : 201 - 212
[17] Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech
Sertolli, Benjamin
Ren, Zhao
Schuller, Björn W.
Cummins, Nicholas
Computer Speech and Language, 2021, 68
[18] Transfer learning based convolution neural net for authentication and classification of emotions from natural and stimulated speech signals
Kumar, Mukul
Katyal, Nipun
Ruban, Nersisson
Lyakso, Elena
Mekala, A. Mary
Raj, Alex Noel Joseph
Richard, G. Maarc
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (01) : 2013 - 2024
[19] Random Deep Belief Networks for Recognizing Emotions from Speech Signals
Wen, Guihua
Li, Huihui
Huang, Jubing
Li, Danyang
Xun, Eryang
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
[20] Emotions and their implications in speech, interactions and learning
Pauzet, Anne
Roch-Veiras, Sophie
VOIX PLURIELLES, 2015, 12 (01): : 2 - 4

← 1 2 3 4 5 →