Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network

被引:5
|
作者
Pham, Nhat Truong [1 ,2 ]
Nguyen, Sy Dzung [3 ,4 ]
Nguyen, Vu Song Thuy [5 ]
Pham, Bich Ngoc Hong [6 ]
Dang, Duc Ngoc Minh [7 ]
机构
[1] Ton Duc Thang Univ, Inst Computat Sci, Div Computat Mechatron, Ho Chi Minh City, Vietnam
[2] Ton Duc Thang Univ, Fac Elect & Elect Engn, Ho Chi Minh City, Vietnam
[3] Van Lang Univ, Inst Computat Sci & Articial Intelligence, Lab Computat Mechatron, Ho Chi Minh City, Vietnam
[4] Van Lang Univ, Fac Mech Elect & Comp Engn, Sch Technol, Ho Chi Minh City, Vietnam
[5] Michigan State Univ, Dept Comp Sci & Engn, Michigan, MI USA
[6] Ho Chi Minh City Open Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam
[7] FPT Univ, Comp Fundamental Dept, Ho Chi Minh City, Vietnam
关键词
Feature extraction; overlapping sliding window; pattern recognition network; SHAP analysis; speech emotion recognition; TRANSFORM; ALGORITHM; BEARINGS;
D O I
10.1080/24751839.2023.2187278
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.
引用
收藏
页码:317 / 335
页数:19
相关论文
共 50 条
  • [1] An explainable fast deep neural network for emotion recognition
    Di Luzio, Francesco
    Rosato, Antonello
    Panella, Massimo
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [2] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [3] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [4] Speech Emotion Recognition Based on Deep Neural Network
    Zhu, Zijiang
    Hu, Yi
    Li, Junshan
    Li, Jianjun
    Wang, Junhua
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
  • [5] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [6] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Kishor Bhangale
    Mohanaprasad Kothandaraman
    Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
  • [7] Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine
    Han, Kun
    Yu, Dong
    Tashev, Ivan
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 223 - 227
  • [8] SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORK CONSIDERING VERBAL AND NONVERBAL SPEECH SOUNDS
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Hong, Qian-Bei
    Su, Ming-Hsiang
    Chen, Yi-Hsuan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5866 - 5870
  • [9] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [10] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129