IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION

被引:8
|
作者
Meyer, Patrick [1 ]
Xu, Ziyi [1 ]
Fingscheidt, Tim [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, D-38106 Braunschweig, Germany
关键词
Speech emotion recognition; machine learning; log-mel spectrogram; BLSTM; FEATURES;
D O I
10.1109/SLT48900.2021.9383513
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has increased the interest in speech emotion recognition (SER) and has put forth diverse structures and methods to improve performance. In recent years it has turned out that applying SER on a (log-mel) spectrogram and thus, interpreting SER as an image recognition task is a promising method. Following the trend towards using a convolutional neural network (CNN) in combination with a bidirectional long short-term memory (BLSTM) layer, and some subsequent fully connected layers, in this work, we advance the performance of this topology by several contributions: We integrate a multi-kernel width CNN, propose a BLSTM output summarization function, apply an enhanced feature representation, and introduce an effective training method. In order to foster insight into our proposed methods, we separately evaluate the impact of each modification in an ablation study. Based on our modifications, we obtain top results for this type of topology on IEMOCAP with an unweighted average recall of 64.5% on average.
引用
收藏
页码:365 / 372
页数:8
相关论文
共 50 条
  • [21] Emotion Recognition from Speech using Artificial Neural Networks and. Recurrent Neural Networks
    Sharma, Shambhavi
    [J]. 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 153 - 158
  • [22] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
  • [23] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Zheng, W. Q.
    Yu, J. S.
    Zou, Y. X.
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
  • [24] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [25] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
    Huang, Che-Wei
    Narayanan, Shrikanth
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
  • [26] 3D Convolutional Recurrent Global Neural Network for Speech Emotion Recognition
    Zayene, Baraa
    Jlassi, Chiraz
    Arous, Najet
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [27] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
    Sun, Congshan
    Li, Haifeng
    Ma, Lin
    [J]. FRONTIERS IN PSYCHOLOGY, 2023, 13
  • [28] Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    [J]. ENGINEERING LETTERS, 2019, 27 (04) : 901 - 906
  • [29] EEG-based emotion recognition with cascaded convolutional recurrent neural networks
    Meng, Ming
    Zhang, Yu
    Ma, Yuliang
    Gao, Yunyuan
    Kong, Wanzeng
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 783 - 795
  • [30] Deep Convolutional and Recurrent Neural Networks for Emotion Recognition from Human Behaviors
    Deng, James J.
    Leung, Clement H. C.
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT II, 2020, 12250 : 550 - 561