Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network

被引：7

作者：

Sun, Congshan ^{[1
]}

Li, Haifeng ^{[1
]}

Ma, Lin ^{[1
]}

机构：

[1] Harbin Inst Technol, Fac Comp, Harbin, Peoples R China

来源：

FRONTIERS IN PSYCHOLOGY | 2023年 / 13卷

基金：

中国国家自然科学基金;

关键词：

speech emotion recognition; empirical mode decomposition; mode mixing; convolutional neural networks; bidirectional gated recurrent units; EMPIRICAL MODE DECOMPOSITION; HILBERT SPECTRUM; SIGNAL; FEATURES;

D O I：

10.3389/fpsyg.2022.1075624

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

Speech emotion recognition (SER) is the key to human-computer emotion interaction. However, the nonlinear characteristics of speech emotion are variable, complex, and subtly changing. Therefore, accurate recognition of emotions from speech remains a challenge. Empirical mode decomposition (EMD), as an effective decomposition method for nonlinear non-stationary signals, has been successfully used to analyze emotional speech signals. However, the mode mixing problem of EMD affects the performance of EMD-based methods for SER. Various improved methods for EMD have been proposed to alleviate the mode mixing problem. These improved methods still suffer from the problems of mode mixing, residual noise, and long computation time, and their main parameters cannot be set adaptively. To overcome these problems, we propose a novel SER framework, named IMEMD-CRNN, based on the combination of an improved version of the masking signal-based EMD (IMEMD) and convolutional recurrent neural network (CRNN). First, IMEMD is proposed to decompose speech. IMEMD is a novel disturbance-assisted EMD method and can determine the parameters of masking signals to the nature of signals. Second, we extract the 43-dimensional time-frequency features that can characterize the emotion from the intrinsic mode functions (IMFs) obtained by IMEMD. Finally, we input these features into a CRNN network to recognize emotions. In the CRNN, 2D convolutional neural networks (CNN) layers are used to capture nonlinear local temporal and frequency information of the emotional speech. Bidirectional gated recurrent units (BiGRU) layers are used to learn the temporal context information further. Experiments on the publicly available TESS dataset and Emo-DB dataset demonstrate the effectiveness of our proposed IMEMD-CRNN framework. The TESS dataset consists of 2,800 utterances containing seven emotions recorded by two native English speakers. The Emo-DB dataset consists of 535 utterances containing seven emotions recorded by ten native German speakers. The proposed IMEMD-CRNN framework achieves a state-of-the-art overall accuracy of 100% for the TESS dataset over seven emotions and 93.54% for the Emo-DB dataset over seven emotions. The IMEMD alleviates the mode mixing and obtains IMFs with less noise and more physical meaning with significantly improved efficiency. Our IMEMD-CRNN framework significantly improves the performance of emotion recognition.

引用

页数：14

共 50 条

[41] Speech Emotion Recognition Based on Temporal-Spatial Learnable Graph Convolutional Neural Network
Yan, Jingjie
Li, Haihua
Xu, Fengfeng
Zhou, Xiaoyang
Liu, Ying
Yang, Yuan
ELECTRONICS, 2024, 13 (11)
[42] Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding
Huang, Kun-Yi
Wu, Chung-Hsien
Hong, Qian-Bei
Su, Ming-Hsiang
Zeng, Yuan-Rong
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 265 - 269
[43] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Bhangale, Kishor
Kothandaraman, Mohanaprasad
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
[44] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
Kishor Bhangale
Mohanaprasad Kothandaraman
Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
[45] EEG emotion recognition based on TQWT-features and hybrid convolutional recurrent neural network
Zhong, Mei-yu
Yang, Qing-yu
Liu, Yi
Zhen, Bo-yu
Zhao, Feng-da
Xie, Bei-bei
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 79
[46] Multimodal speech emotion recognition and classification using convolutional neural network techniques
Christy, A.
Vaithyasubramanian, S.
Jesudoss, A.
Praveena, M. D. Anto
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 381 - 388
[47] Multimodal speech emotion recognition and classification using convolutional neural network techniques
A. Christy
S. Vaithyasubramanian
A. Jesudoss
M. D. Anto Praveena
International Journal of Speech Technology, 2020, 23 : 381 - 388
[48] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
Nam, Youngja
Lee, Chankyu
SENSORS, 2021, 21 (13)
[49] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
Mu, Yawei
Gomez, Hernandez
Cano Montes, Antonio
Alcaraz Martinez, Carlos
Wang, Xuetian
Gao, Hongmin
2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
[50] Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition
Jiang, Pengxu
Xu, Xinzhou
Tao, Huawei
Zhao, Li
Zou, Cairong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1564 - 1573

← 1 2 3 4 5 →