Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network

被引:5
|
作者
Pham, Nhat Truong [1 ,2 ]
Nguyen, Sy Dzung [3 ,4 ]
Nguyen, Vu Song Thuy [5 ]
Pham, Bich Ngoc Hong [6 ]
Dang, Duc Ngoc Minh [7 ]
机构
[1] Ton Duc Thang Univ, Inst Computat Sci, Div Computat Mechatron, Ho Chi Minh City, Vietnam
[2] Ton Duc Thang Univ, Fac Elect & Elect Engn, Ho Chi Minh City, Vietnam
[3] Van Lang Univ, Inst Computat Sci & Articial Intelligence, Lab Computat Mechatron, Ho Chi Minh City, Vietnam
[4] Van Lang Univ, Fac Mech Elect & Comp Engn, Sch Technol, Ho Chi Minh City, Vietnam
[5] Michigan State Univ, Dept Comp Sci & Engn, Michigan, MI USA
[6] Ho Chi Minh City Open Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam
[7] FPT Univ, Comp Fundamental Dept, Ho Chi Minh City, Vietnam
关键词
Feature extraction; overlapping sliding window; pattern recognition network; SHAP analysis; speech emotion recognition; TRANSFORM; ALGORITHM; BEARINGS;
D O I
10.1080/24751839.2023.2187278
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.
引用
收藏
页码:317 / 335
页数:19
相关论文
共 50 条
  • [22] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
  • [23] Deep scattering network for speech emotion recognition
    Singh, Premjeet
    Saha, Goutam
    Sahidullah, Md
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 131 - 135
  • [24] Speech Emotion Recognition with Hybrid Neural Network
    Wei, Chuanzheng
    Sun, Xiao
    Tian, Fang
    Ren, Fuji
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
  • [25] Epileptic signal classification using convolutional neural network and Shapley additive explainable artificial intelligence method
    Rathod, Prajakta
    Naik, Shefali
    Bhalodiya, Jayendra M.
    Neural Computing and Applications, 2025, 37 (06) : 4937 - 4955
  • [26] Research on Speech Emotion Recognition Technology based on Deep and Shallow Neural Network
    Wang, Jian
    Han, Zhiyan
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3555 - 3558
  • [27] Enhancing Speech Emotion Recognition Using Deep Convolutional Neural Networks
    Islam, M. M. Manjurul
    Kabir, Md Alamgir
    Sheikh, Alamin
    Saiduzzaman, Muhammad
    Hafid, Abdelakram
    Abdullah, Saad
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 95 - 100
  • [28] DOA Estimation Using Deep Neural Network with Angular Sliding Window
    Li, Yang
    Huang, Zanhu
    Liang, Can
    Zhang, Liang
    Wang, Yanhua
    Wang, Junfu
    Zhang, Yi
    Lv, Hongfen
    ELECTRONICS, 2023, 12 (04)
  • [29] Facial Emotion Recognition Using Deep Convolutional Neural Network
    Pranav, E.
    Kamal, Suraj
    Chandran, Satheesh C.
    Supriya, M. H.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 317 - 320
  • [30] Neural Comb Filtering Using Sliding Window Attention Network for Speech Enhancement
    Parvathala, Venkatesh
    Andhavarapu, Sivaganesh
    Pamisetty, Giridhar
    Murty, K. Sri Rama
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 322 - 343