Enhancing speech emotion recognition through deep learning and handcrafted feature fusion

被引:0
|
作者
Eris, Fatma Gunes [1 ]
Akbal, Erhan [1 ]
机构
[1] Firat Univ, Coll Technol, Dept Digital Forens Engn, TR-23100 Elazig, Turkiye
关键词
Acoustic feature extraction; Feature engineering; Emotion recognition; Deep learning; Feature fusion;
D O I
10.1016/j.apacoust.2024.110070
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we introduce an innovative investigation in speech emotion recognition (SER). The proposed model combines deep learning-based and handcrafted audio features to achieve optimal accuracy. The proposed model employs an iterative feature selection and majority voting pipeline to obtain better results by fusing deep learning-based and handcrafted features. The wav2vec2 model and the openSmile audio processing library are used in order to extract audio features from audio data. Then the feature selection and majority voting techniques are used to identify the optimal feature selection methods for diverse feature sets and to combine their strengths. The experiments are performed using a diverse and extensive corpus to ensure the robustness of the proposed method. In the construction of this multi-corpus dataset, we used four well-known benchmark datasets, namely Ravdess, Savee, Crema-D, and Tess. All of these datasets are publicly available. These datasets are combined on six common emotions: sadness, happiness, fear, anger, surprise, and disgust. The resultant dataset comprises 11,511 samples across these categories. The proposed method has been shown to achieve results comparable to those reported in the existing literature. The experimental results indicate that the proposed pipeline leads to a 3 % improvement in classification accuracy. The highest achieved accuracy on the multi-corpus dataset is 92.55 %.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Student's Feedback by emotion and speech recognition through Deep Learning
    Jain, Ati
    Sah, Hare Ram
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND INTELLIGENT SYSTEMS (ICCCIS), 2021, : 442 - 447
  • [22] Metric Learning Based Feature Representation with Gated Fusion Model for Speech Emotion Recognition
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    [J]. INTERSPEECH 2021, 2021, : 4503 - 4507
  • [23] Novel feature fusion method for speech emotion recognition based on multiple kernel learning
    [J]. Zhao, L. (zhaoli@seu.edu.cn), 1600, Southeast University (29):
  • [24] Emotion Recognition in Speech with Deep Learning Architectures
    Erdal, Mehmet
    Kaechele, Markus
    Schwenker, Friedhelm
    [J]. ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
  • [25] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    [J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [26] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [27] Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
    Aggarwal, Apeksha
    Srivastava, Akshat
    Agarwal, Ajay
    Chahal, Nidhi
    Singh, Dilbag
    Alnuaim, Abeer Ali
    Alhadlaq, Aseel
    Lee, Heung-No
    [J]. SENSORS, 2022, 22 (06)
  • [28] A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition
    Tu, Zhongwen
    Liu, Bin
    Zhao, Wei
    Yan, Raoxin
    Zou, Yang
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [29] Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning
    Shang, Yanan
    Fu, Tianqi
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 24
  • [30] Language dialect based speech emotion recognition through deep learning techniques
    Sukumar Rajendran
    Sandeep Kumar Mathivanan
    Prabhu Jayagopal
    Maheshwari Venkatasen
    Thanapal Pandi
    Manivannan Sorakaya Somanathan
    Muthamilselvan Thangaval
    Prasanna Mani
    [J]. International Journal of Speech Technology, 2021, 24 : 625 - 635