Effect on speech emotion classification of a feature selection approach using a convolutional neural network

被引：21

作者：

Amjad, Ammar ^{[1
]}

Khan, Lal ^{[1
]}

Chang, Hsien-Tsung ^{[1
,2
,3
,4
]}

机构：

[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan

[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan, Taiwan

[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan, Taiwan

[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan, Taiwan

来源：

PEERJ COMPUTER SCIENCE | 2021年 / 7卷

关键词：

Speech emotion recognition; Feature extraction; Feature selection; Convolutional neural network; Mel-spectrogram; Data augmentation; RECOGNITION FEATURES; DEEP; FRAMEWORK; MODEL;

D O I：

10.7717/peerj-cs.766

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.

引用

页数：28

共 50 条

[41] Method for Reducing the Feature Space Dimension in Speech Emotion Recognition Using Convolutional Neural Networks
A. O. Iskhakova
D. A. Vol’f
R. V. Meshcheryakov
[J]. Automation and Remote Control, 2022, 83 : 857 - 868
[42] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
Mohanty, Aniruddha
Cherukuri, Ravindranath C.
Prusty, Alok Ranjan
[J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
[43] Revolutionizing Speech Emotion Recognition: A Novel Hilbert Curve Approach for Two-Dimensional Representation and Convolutional Neural Network Classification
Tyagi, Suryakant
Szenasi, Sandor
[J]. ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2024, 2024, 157 : 75 - 85
[44] An efficient feature selection and classification approach for an intrusion detection system using Optimal Neural Network
Pran, S. Gokul
Raja, Sivakami
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (05) : 8561 - 8571
[45] Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit
Jiang, Pengxu
Fu, Hongliang
Tao, Huawei
[J]. ENGINEERING LETTERS, 2019, 27 (04) : 901 - 906
[46] Emotion Recognition Using a Convolutional Neural Network
Zatarain-Cabada, Ramon
Lucia Barron-Estrada, Maria
Gonzalez-Hernandez, Francisco
Rodriguez-Rangel, Hector
[J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2017, PT II, 2018, 10633 : 208 - 219
[47] Ascertaining Speech Emotion using Attention-based Convolutional Neural Network Framework
Arya, Ashima
Arya, Vaishali
Kohli, Neha
Sukhija, Namrata
Ibrahim, Ashraf Osman
Bharany, Salil
Binzagr, Faisal
Muchtar, Farkhana Binti
Mamoun, Mohamed
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 614 - 622
[48] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Zhang, Linjuan
Guan, Haotian
Li, Xiangang
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
[49] Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network
Gharavian, Davood
Sheikhan, Mansour
Nazerieh, Alireza
Garoucy, Sahar
[J]. NEURAL COMPUTING & APPLICATIONS, 2012, 21 (08): : 2115 - 2126
[50] Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network
Davood Gharavian
Mansour Sheikhan
Alireza Nazerieh
Sahar Garoucy
[J]. Neural Computing and Applications, 2012, 21 : 2115 - 2126

← 1 2 3 4 5 →