Enhancing speech emotion recognition through deep learning and handcrafted feature fusion

被引:0
|
作者
Eris, Fatma Gunes [1 ]
Akbal, Erhan [1 ]
机构
[1] Firat Univ, Coll Technol, Dept Digital Forens Engn, TR-23100 Elazig, Turkiye
关键词
Acoustic feature extraction; Feature engineering; Emotion recognition; Deep learning; Feature fusion;
D O I
10.1016/j.apacoust.2024.110070
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we introduce an innovative investigation in speech emotion recognition (SER). The proposed model combines deep learning-based and handcrafted audio features to achieve optimal accuracy. The proposed model employs an iterative feature selection and majority voting pipeline to obtain better results by fusing deep learning-based and handcrafted features. The wav2vec2 model and the openSmile audio processing library are used in order to extract audio features from audio data. Then the feature selection and majority voting techniques are used to identify the optimal feature selection methods for diverse feature sets and to combine their strengths. The experiments are performed using a diverse and extensive corpus to ensure the robustness of the proposed method. In the construction of this multi-corpus dataset, we used four well-known benchmark datasets, namely Ravdess, Savee, Crema-D, and Tess. All of these datasets are publicly available. These datasets are combined on six common emotions: sadness, happiness, fear, anger, surprise, and disgust. The resultant dataset comprises 11,511 samples across these categories. The proposed method has been shown to achieve results comparable to those reported in the existing literature. The experimental results indicate that the proposed pipeline leads to a 3 % improvement in classification accuracy. The highest achieved accuracy on the multi-corpus dataset is 92.55 %.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [2] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
    Khan, Waleed Akram
    ul Qudous, Hamad
    Farhan, Asma Ahmad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
  • [3] Enhancing Emotion Recognition from Speech through Feature Selection
    Kostoulas, Theodoros
    Ganchev, Todor
    Lazaridis, Alexandros
    Fakotakis, Nikos
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 338 - 344
  • [4] Hybrid Unsupervised Handcrafted and Deep Feature Characterization and Fusion for EEG-Based Emotion Recognition
    Liang, Zhen
    [J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2021, 168 : S55 - S55
  • [5] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    [J]. PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074
  • [6] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [7] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    [J]. Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [8] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [9] A FEATURE FUSION METHOD BASED ON EXTREME LEARNING MACHINE FOR SPEECH EMOTION RECOGNITION
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2666 - 2670
  • [10] Design of smart home system speech emotion recognition model based on ensemble deep learning and feature fusion
    Wang, Mengsheng
    Ma, Hongbin
    Wang, Yingli
    Sun, Xianhe
    [J]. APPLIED ACOUSTICS, 2024, 218