Understanding human emotions through speech spectrograms using deep neural network

被引:0
|
作者
Vedika Gupta
Stuti Juyal
Yu-Chen Hu
机构
[1] Bharati Vidyapeeth’s College of Engineering,Department of Computer Science & Engineering
[2] Providence University,Department of Computer Science & Information Management
来源
关键词
Bag of visual words (BoVW); Deep neural networks (DNNs); Hybrid acoustic features (HAF); Long short-term memory (LSTM); Mel-frequency cepstrum coefficient (MFCC); Multi-layer perceptron (MLP); Principle component analysis (PCA); Speech emotion recognition (SER);
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents the analysis and classification of speech spectrograms for recognizing emotions in RAVDESS dataset. Feature extraction from speech utterances is performed using Mel-Frequency Cepstrum Coefficient. Thereafter, deep neural networks are employed to classify speech into six emotions (happy, sad, neutral, calm, disgust, and fear). Firstly, this paper presents a comprehensive comparative study on DNNs on prosodic features. The outcomes of all DNNs are presented in the paper. Secondly, the paper puts forward an analysis of Bag of Visual Words that uses speeded-up robust features (SURF) to cluster them using K-means and further classify them using support vector machine (SVM) into aforementioned emotions. Out of the five DNNs deployed, (i) Long Short-Term Memory (LSTM) on MFCC and, (ii) Multi-Layer Perceptron (MLP) classifier on MFCC, outperforms others, giving an accuracy score of 0.70 (in both cases). Further, the BoVW technique performed 53% of correct classification. Therefore, the proposed methodology constructs a Hybrid of Acoustic Features (HAF) and feeds them into an ensemble of bagged multi-layer perceptron classifier imparting an accuracy of 85%. Also, it achieves a precision score between 0.77 and 0.88 for the classification of six emotions.
引用
收藏
页码:6944 / 6973
页数:29
相关论文
共 50 条
  • [1] Understanding human emotions through speech spectrograms using deep neural network
    Gupta, Vedika
    Juyal, Stuti
    Hu, Yu-Chen
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (05): : 6944 - 6973
  • [2] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    [J]. 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
  • [3] EmoDNN: understanding emotions from short texts through a deep neural network ensemble
    Kamran, Sara
    Zall, Raziyeh
    Hosseini, Saeid
    Kangavari, MohammadReza
    Rahmani, Sana
    Hua, Wen
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (18): : 13565 - 13582
  • [4] EmoDNN: understanding emotions from short texts through a deep neural network ensemble
    Sara Kamran
    Raziyeh Zall
    Saeid Hosseini
    MohammadReza Kangavari
    Sana Rahmani
    Wen Hua
    [J]. Neural Computing and Applications, 2023, 35 : 13565 - 13582
  • [5] Frequency line detection in spectrograms using a deep neural network with attention
    Jiang, DingLin
    Luo, Xinwei
    Shen, Qifan
    [J]. Journal of the Acoustical Society of America, 2024, 156 (05): : 3204 - 3216
  • [6] An evaluation of deep neural network models for music classification using spectrograms
    Jingxian Li
    Lixin Han
    Xiaoshuang Li
    Jun Zhu
    Baohua Yuan
    Zhinan Gou
    [J]. Multimedia Tools and Applications, 2022, 81 : 4621 - 4647
  • [7] An evaluation of deep neural network models for music classification using spectrograms
    Li, Jingxian
    Han, Lixin
    Li, Xiaoshuang
    Zhu, Jun
    Yuan, Baohua
    Gou, Zhinan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4621 - 4647
  • [8] Classifying Emotions in Twitter Messages Using a Deep Neural Network
    da Silva, Isabela R. R.
    Lima, Ana C. E. S.
    Pasti, Rodrigo
    de Castro, Leandro N.
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 801 : 283 - 290
  • [9] Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
    Tursunov, Anvarjon
    Mustageem
    Choeh, Joon Yeon
    Kwon, Soonil
    [J]. SENSORS, 2021, 21 (17)
  • [10] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796