BLSTM and CNN Stacking Architecture for Speech Emotion Recognition

被引:15
|
作者
Li, Dongdong [1 ,2 ,3 ]
Sun, Linyu [2 ]
Xu, Xinlei [2 ]
Wang, Zhe [1 ,2 ]
Zhang, Jing [2 ]
Du, Wenli [1 ]
机构
[1] East China Univ Sci & Technol, Key Lab Adv Control & Optimizat Chem Proc, Minist Educ, Shanghai 200237, Peoples R China
[2] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China
[3] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China
关键词
Speech emotion recognition; Convolutional neural network; Bidirectional long short term memory; Stacking; NETWORK;
D O I
10.1007/s11063-021-10581-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a huge challenge for distinguishing and interpreting the sentiments carried in speech. Fortunately, deep learning is proved to have great ability to deal with acoustic features. For instance, Bidirectional Long Short Term Memory (BLSTM) has an advantage of solving time series acoustic features and Convolutional Neural Network (CNN) can discover the local structure among different features. This paper proposed the BLSTM and CNN Stacking Architecture (BCSA) to enhance the ability to recognition emotions. In order to match the input formats of BLSTM and CNN, slicing feature matrices is necessary. For utilizing the different roles of the BLSTM and CNN, the Stacking is employed to integrate the BLSTM and CNN. In detail, taking into account overfitting problem, the estimates of probabilistic quantities from BLSTM and CNN are combined as new data using K-fold cross validation. Finally, based on the Stacking models, the logistic regression is used to recognize emotions effectively by fitting the new data. The experiment results demonstrate that the performance of proposed architecture is better than that of single model. Furthermore, compared with the state-of-the-art model on SER in our knowledge, the proposed method BCSA may be more suitable for SER by integrating time series acoustic features and the local structure among different features.
引用
收藏
页码:4097 / 4115
页数:19
相关论文
共 50 条
  • [31] A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition
    Mustaqeem
    Kwon, Soonil
    [J]. SENSORS, 2020, 20 (01)
  • [32] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Swami Mishra
    Nehal Bhatnagar
    Prakasam P
    Sureshkumar T. R
    [J]. Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
  • [33] Static, Dynamic and Acceleration Features for CNN-Based Speech Emotion Recognition
    Khalifa, Intissar
    Ejbali, Ridha
    Napoletano, Paolo
    Schettini, Raimondo
    Zaied, Mourad
    [J]. AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 348 - 358
  • [34] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
    Peng, Zixuan
    Lu, Yu
    Pan, Shengfeng
    Liu, Yunfeng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024
  • [35] Emotion Recognition Based On CNN
    Cao, Guolu
    Ma, Yuliang
    Meng, Xiaofei
    Gao, Yunyuan
    Meng, Ming
    [J]. PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8627 - 8630
  • [36] Real Time Emotion Recognition from Facial Expressions Using CNN Architecture
    Ozdemir, Mehmet Akif
    Elagoz, Berkay
    Alaybeyoglu, Aysegul
    Sadighzadeh, Reza
    Akan, Aydin
    [J]. 2019 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2019, : 417 - 420
  • [37] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [38] A novel decomposition-based architecture for multilingual speech emotion recognition
    Ravi
    Taran, Sachin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, : 9347 - 9359
  • [39] 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features
    Mustaqeem
    Kwon, Soonil
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (03): : 4039 - 4059
  • [40] Simulation of English speech emotion recognition based on transfer learning and CNN neural network
    Chen, Xuehua
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2349 - 2360