Effect on speech emotion classification of a feature selection approach using a convolutional neural network

被引:21
|
作者
Amjad, Ammar [1 ]
Khan, Lal [1 ]
Chang, Hsien-Tsung [1 ,2 ,3 ,4 ]
机构
[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan
[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan, Taiwan
[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan, Taiwan
[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan, Taiwan
关键词
Speech emotion recognition; Feature extraction; Feature selection; Convolutional neural network; Mel-spectrogram; Data augmentation; RECOGNITION FEATURES; DEEP; FRAMEWORK; MODEL;
D O I
10.7717/peerj-cs.766
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Method for Reducing the Feature Space Dimension in Speech Emotion Recognition Using Convolutional Neural Networks
    A. O. Iskhakova
    D. A. Vol’f
    R. V. Meshcheryakov
    [J]. Automation and Remote Control, 2022, 83 : 857 - 868
  • [42] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [43] Revolutionizing Speech Emotion Recognition: A Novel Hilbert Curve Approach for Two-Dimensional Representation and Convolutional Neural Network Classification
    Tyagi, Suryakant
    Szenasi, Sandor
    [J]. ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2024, 2024, 157 : 75 - 85
  • [44] An efficient feature selection and classification approach for an intrusion detection system using Optimal Neural Network
    Pran, S. Gokul
    Raja, Sivakami
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (05) : 8561 - 8571
  • [45] Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    [J]. ENGINEERING LETTERS, 2019, 27 (04) : 901 - 906
  • [46] Emotion Recognition Using a Convolutional Neural Network
    Zatarain-Cabada, Ramon
    Lucia Barron-Estrada, Maria
    Gonzalez-Hernandez, Francisco
    Rodriguez-Rangel, Hector
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2017, PT II, 2018, 10633 : 208 - 219
  • [47] Ascertaining Speech Emotion using Attention-based Convolutional Neural Network Framework
    Arya, Ashima
    Arya, Vaishali
    Kohli, Neha
    Sukhija, Namrata
    Ibrahim, Ashraf Osman
    Bharany, Salil
    Binzagr, Faisal
    Muchtar, Farkhana Binti
    Mamoun, Mohamed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 614 - 622
  • [48] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    Li, Xiangang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
  • [49] Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network
    Gharavian, Davood
    Sheikhan, Mansour
    Nazerieh, Alireza
    Garoucy, Sahar
    [J]. NEURAL COMPUTING & APPLICATIONS, 2012, 21 (08): : 2115 - 2126
  • [50] Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network
    Davood Gharavian
    Mansour Sheikhan
    Alireza Nazerieh
    Sahar Garoucy
    [J]. Neural Computing and Applications, 2012, 21 : 2115 - 2126