IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION

被引：8

作者：

Meyer, Patrick ^{[1
]}

Xu, Ziyi ^{[1
]}

Fingscheidt, Tim ^{[1
]}

机构：

[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, D-38106 Braunschweig, Germany

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

Speech emotion recognition; machine learning; log-mel spectrogram; BLSTM; FEATURES;

D O I：

10.1109/SLT48900.2021.9383513

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning has increased the interest in speech emotion recognition (SER) and has put forth diverse structures and methods to improve performance. In recent years it has turned out that applying SER on a (log-mel) spectrogram and thus, interpreting SER as an image recognition task is a promising method. Following the trend towards using a convolutional neural network (CNN) in combination with a bidirectional long short-term memory (BLSTM) layer, and some subsequent fully connected layers, in this work, we advance the performance of this topology by several contributions: We integrate a multi-kernel width CNN, propose a BLSTM output summarization function, apply an enhanced feature representation, and introduce an effective training method. In order to foster insight into our proposed methods, we separately evaluate the impact of each modification in an ablation study. Based on our modifications, we obtain top results for this type of topology on IEMOCAP with an unweighted average recall of 64.5% on average.

引用

页码：365 / 372

页数：8

共 50 条

[21] Emotion Recognition from Speech using Artificial Neural Networks and. Recurrent Neural Networks
Sharma, Shambhavi
[J]. 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 153 - 158
[22] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
Mao, Qirong
Dong, Ming
Huang, Zhengwei
Zhan, Yongzhao
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
[23] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Zheng, W. Q.
Yu, J. S.
Zou, Y. X.
[J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
[24] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[25] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
Huang, Che-Wei
Narayanan, Shrikanth
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
[26] 3D Convolutional Recurrent Global Neural Network for Speech Emotion Recognition
Zayene, Baraa
Jlassi, Chiraz
Arous, Najet
[J]. 2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[27] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
Sun, Congshan
Li, Haifeng
Ma, Lin
[J]. FRONTIERS IN PSYCHOLOGY, 2023, 13
[28] Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit
Jiang, Pengxu
Fu, Hongliang
Tao, Huawei
[J]. ENGINEERING LETTERS, 2019, 27 (04) : 901 - 906
[29] EEG-based emotion recognition with cascaded convolutional recurrent neural networks
Meng, Ming
Zhang, Yu
Ma, Yuliang
Gao, Yunyuan
Kong, Wanzeng
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 783 - 795
[30] Deep Convolutional and Recurrent Neural Networks for Emotion Recognition from Human Behaviors
Deng, James J.
Leung, Clement H. C.
[J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT II, 2020, 12250 : 550 - 561

← 1 2 3 4 5 →