Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network

被引：5

作者：

Pham, Nhat Truong ^{[1
,2
]}

Nguyen, Sy Dzung ^{[3
,4
]}

Nguyen, Vu Song Thuy ^{[5
]}

Pham, Bich Ngoc Hong ^{[6
]}

Dang, Duc Ngoc Minh ^{[7
]}

机构：

[1] Ton Duc Thang Univ, Inst Computat Sci, Div Computat Mechatron, Ho Chi Minh City, Vietnam

[2] Ton Duc Thang Univ, Fac Elect & Elect Engn, Ho Chi Minh City, Vietnam

[3] Van Lang Univ, Inst Computat Sci & Articial Intelligence, Lab Computat Mechatron, Ho Chi Minh City, Vietnam

[4] Van Lang Univ, Fac Mech Elect & Comp Engn, Sch Technol, Ho Chi Minh City, Vietnam

[5] Michigan State Univ, Dept Comp Sci & Engn, Michigan, MI USA

[6] Ho Chi Minh City Open Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam

[7] FPT Univ, Comp Fundamental Dept, Ho Chi Minh City, Vietnam

来源：

JOURNAL OF INFORMATION AND TELECOMMUNICATION | 2023年 / 7卷 / 03期

关键词：

Feature extraction; overlapping sliding window; pattern recognition network; SHAP analysis; speech emotion recognition; TRANSFORM; ALGORITHM; BEARINGS;

D O I：

10.1080/24751839.2023.2187278

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.

引用

页码：317 / 335

页数：19

共 50 条

[1] An explainable fast deep neural network for emotion recognition
Di Luzio, Francesco
Rosato, Antonello
Panella, Massimo
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
[2] A Study on Speech Emotion Recognition Using a Deep Neural Network
Lee, Kyong Hee
Choi, Hyun Kyun
Jang, Byung Tae
Kim, Do Hyun
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
[3] Active Learning for Speech Emotion Recognition Using Deep Neural Network
Abdelwahab, Mohammed
Busso, Carlos
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
[4] Speech Emotion Recognition Based on Deep Neural Network
Zhu, Zijiang
Hu, Yi
Li, Junshan
Li, Jianjun
Wang, Junhua
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
[5] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Bhangale, Kishor
Kothandaraman, Mohanaprasad
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
[6] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Kishor Bhangale
Mohanaprasad Kothandaraman
Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
[7] Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine
Han, Kun
Yu, Dong
Tashev, Ivan
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 223 - 227
[8] SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORK CONSIDERING VERBAL AND NONVERBAL SPEECH SOUNDS
Huang, Kun-Yi
Wu, Chung-Hsien
Hong, Qian-Bei
Su, Ming-Hsiang
Chen, Yi-Hsuan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5866 - 5870
[9] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
Huang, Ying
Hu, Mingqing
Yu, Xianguo
Wang, Tao
Yang, Chen
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
[10] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
Mohanty, Aniruddha
Cherukuri, Ravindranath C.
Prusty, Alok Ranjan
THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129

← 1 2 3 4 5 →