Effect on speech emotion classification of a feature selection approach using a convolutional neural network

被引:21
|
作者
Amjad, Ammar [1 ]
Khan, Lal [1 ]
Chang, Hsien-Tsung [1 ,2 ,3 ,4 ]
机构
[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan
[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan, Taiwan
[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan, Taiwan
[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan, Taiwan
关键词
Speech emotion recognition; Feature extraction; Feature selection; Convolutional neural network; Mel-spectrogram; Data augmentation; RECOGNITION FEATURES; DEEP; FRAMEWORK; MODEL;
D O I
10.7717/peerj-cs.766
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Effect on speech emotion classification of a feature selection approach using a convolutional neural network
    Amjad, Ammar
    Khan, Lal
    Chang, Hsien-Tsung
    [J]. PeerJ Computer Science, 2021, 7
  • [2] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
    Farooq, Misbah
    Hussain, Fawad
    Baloch, Naveed Khan
    Raja, Fawad Riasat
    Yu, Heejung
    Zikria, Yousaf Bin
    [J]. SENSORS, 2020, 20 (21) : 1 - 18
  • [3] Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network
    Kaur, Kamaldeep
    Singh, Parminder
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [4] Emotion Classification Based on Convolutional Neural Network Using Speech Data
    Vrebcevic, N.
    Mijic, I.
    Petrinovic, D.
    [J]. 2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1007 - 1012
  • [5] A hybrid convolutional neural network approach for feature selection and disease classification
    Debata, Prajna Paramita
    Mohapatra, Puspanjali
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 : 2580 - 2599
  • [6] A hybrid convolutional neural network approach for feature selection and disease classification
    Department of Computer Science and Engineering, International Institute of Information Technology, Bhubaneswar, India
    [J]. Turk J Electr Eng Comput Sci, 1600, (2580-2599):
  • [7] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    A. Christy
    S. Vaithyasubramanian
    A. Jesudoss
    M. D. Anto Praveena
    [J]. International Journal of Speech Technology, 2020, 23 : 381 - 388
  • [8] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    Christy, A.
    Vaithyasubramanian, S.
    Jesudoss, A.
    Praveena, M. D. Anto
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 381 - 388
  • [9] Visual Attribute Classification Using Feature Selection and Convolutional Neural Network
    Qian, Rongqiang
    Yue, Yong
    Coenen, Frans
    Zhang, Bailing
    [J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 649 - 653
  • [10] An Approach for Cancer-Type Classification Using Feature Selection Techniques with Convolutional Neural Network
    Almuayqil, Saleh N.
    Elbashir, Murtada K.
    Ezz, Mohamed
    Mohammed, Mohanad
    Mostafa, Ayman Mohamed
    Alruily, Meshrif
    Hamouda, Eslam
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (19):