BLSTM and CNN Stacking Architecture for Speech Emotion Recognition

被引:0
|
作者
Dongdong Li
Linyu Sun
Xinlei Xu
Zhe Wang
Jing Zhang
Wenli Du
机构
[1] Ministry of Education,Key Laboratory of Advanced Control and Optimization for Chemical Processes
[2] East China University of Science and Technology,Department of Computer Science and Engineering
[3] East China University of Science and Technology,Provincial Key Laboratory for Computer Information Processing Technology
[4] Soochow University,undefined
来源
Neural Processing Letters | 2021年 / 53卷
关键词
Speech emotion recognition; Convolutional neural network; Bidirectional long short term memory; Stacking;
D O I
暂无
中图分类号
学科分类号
摘要
Speech Emotion Recognition (SER) is a huge challenge for distinguishing and interpreting the sentiments carried in speech. Fortunately, deep learning is proved to have great ability to deal with acoustic features. For instance, Bidirectional Long Short Term Memory (BLSTM) has an advantage of solving time series acoustic features and Convolutional Neural Network (CNN) can discover the local structure among different features. This paper proposed the BLSTM and CNN Stacking Architecture (BCSA) to enhance the ability to recognition emotions. In order to match the input formats of BLSTM and CNN, slicing feature matrices is necessary. For utilizing the different roles of the BLSTM and CNN, the Stacking is employed to integrate the BLSTM and CNN. In detail, taking into account overfitting problem, the estimates of probabilistic quantities from BLSTM and CNN are combined as new data using K-fold cross validation. Finally, based on the Stacking models, the logistic regression is used to recognize emotions effectively by fitting the new data. The experiment results demonstrate that the performance of proposed architecture is better than that of single model. Furthermore, compared with the state-of-the-art model on SER in our knowledge, the proposed method BCSA may be more suitable for SER by integrating time series acoustic features and the local structure among different features.
引用
收藏
页码:4097 / 4115
页数:18
相关论文
共 50 条
  • [1] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Li, Dongdong
    Sun, Linyu
    Xu, Xinlei
    Wang, Zhe
    Zhang, Jing
    Du, Wenli
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4097 - 4115
  • [2] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    [J]. 2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [3] Gender-Aware CNN-BLSTM for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Yu, Qiang
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 782 - 790
  • [4] A Combined CNN Architecture for Speech Emotion Recognition
    Begazo, Rolinson
    Aguilera, Ana
    Dongo, Irvin
    Cardinale, Yudith
    [J]. SENSORS, 2024, 24 (17)
  • [5] Speaker-Independent Speech Emotion Recognition Based on CNN-BLSTM and Multiple SVMs
    Liu, Zhen-Tao
    Xiao, Peng
    Li, Dan-Yun
    Hao, Man
    [J]. INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PT III, 2019, 11742 : 481 - 491
  • [6] Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks
    Sultana, Sadia
    Iqbal, M. Zafar
    Selim, M. Reza
    Rashid, Md. Mijanur
    Rahman, M. Shahidur
    [J]. IEEE ACCESS, 2022, 10 : 564 - 578
  • [7] Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications
    Prabhakar, Gudmalwar Ashishkumar
    Basel, Biplove
    Dutta, Anirban
    Rao, Ch. V. Rama
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2023, 69 (02) : 226 - 235
  • [8] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [9] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (05):
  • [10] Experimental Evaluation of CNN Architecture for Speech Recognition
    Haque, Md Amaan
    Verma, Abhishek
    Alex, John Sahaya Rani
    Venkatesan, Nithya
    [J]. FIRST INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR COMPUTATIONAL INTELLIGENCE, 2020, 1045 : 507 - 514