Speech Emotion Recognition Using Deep Learning Transfer Models and Explainable Techniques

被引:2
|
作者
Kim, Tae-Wan [1 ]
Kwak, Keun-Chang [1 ]
机构
[1] Chosun Univ, Dept Elect Engn, Interdisciplinary Program IT Bio Convergence Syst, Gwangju 61452, South Korea
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 04期
关键词
speech emotion recognition; explainable model; deep learning; YAMNet; VGGish; audible feature;
D O I
10.3390/app14041553
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This study aims to establish a greater reliability compared to conventional speech emotion recognition (SER) studies. This is achieved through preprocessing techniques that reduce uncertainty elements, models that combine the structural features of each model, and the application of various explanatory techniques. The ability to interpret can be made more accurate by reducing uncertain learning data, applying data in different environments, and applying techniques that explain the reasoning behind the results. We designed a generalized model using three different datasets, and each speech was converted into a spectrogram image through STFT preprocessing. The spectrogram was divided into the time domain with overlapping to match the input size of the model. Each divided section is expressed as a Gaussian distribution, and the quality of the data is investigated by the correlation coefficient between distributions. As a result, the scale of the data is reduced, and uncertainty is minimized. VGGish and YAMNet are the most representative pretrained deep learning networks frequently used in conjunction with speech processing. In dealing with speech signal processing, it is frequently advantageous to use these pretrained models synergistically rather than exclusively, resulting in the construction of ensemble deep networks. And finally, various explainable models (Grad CAM, LIME, occlusion sensitivity) are used in analyzing classified results. The model exhibits adaptability to voices in various environments, yielding a classification accuracy of 87%, surpassing that of individual models. Additionally, output results are confirmed by an explainable model to extract essential emotional areas, converted into audio files for auditory analysis using Grad CAM in the time domain. Through this study, we enhance the uncertainty of activation areas that are generated by Grad CAM. We achieve this by applying the interpretable ability from previous studies, along with effective preprocessing and fusion models. We can analyze it from a more diverse perspective through other explainable techniques.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    [J]. IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [2] Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models
    Abbaschian, Babak Joze
    Sierra-Sosa, Daniel
    Elmaghraby, Adel
    [J]. SENSORS, 2021, 21 (04) : 1 - 27
  • [3] Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques
    Mihalache, Serban
    Burileanu, Dragos
    [J]. ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2023, 26 (3-4): : 375 - 387
  • [4] Deep Learning Techniques for Speech Emotion Recognition : A Review
    Pandey, Sandeep Kumar
    Shekhawat, H. S.
    Prasanna, S. R. M.
    [J]. 2019 29TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2019, : 197 - 202
  • [5] Urdu Speech Emotion Recognition using Speech Spectral Features and Deep Learning Techniques
    Taj, Soonh
    Shaikh, Ghulam Mujtaba
    Hassan, Saif
    Nimra
    [J]. 2023 4th International Conference on Computing, Mathematics and Engineering Technologies: Sustainable Technologies for Socio-Economic Development, iCoMET 2023, 2023,
  • [6] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    [J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [7] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [8] Explainable Emotion Recognition from Tweets using Deep Learning and Word Embedding Models
    Abubakar, Abdulqahar Mukhtar
    Gupta, Deepa
    Palaniswamy, Suja
    [J]. 2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [9] Kids' Emotion Recognition Using Various Deep-Learning Models with Explainable AI
    Rathod, Manish
    Dalvi, Chirag
    Kaur, Kulveen
    Patil, Shruti
    Gite, Shilpa
    Kamat, Pooja
    Kotecha, Ketan
    Abraham, Ajith
    Gabralla, Lubna Abdelkareim
    [J]. SENSORS, 2022, 22 (20)
  • [10] Speech Emotion Recognition Using Transfer Learning
    Song, Peng
    Jin, Yun
    Zhao, Li
    Xin, Minghai
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2530 - 2532