Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition

被引:19
|
作者
Zhang, Hua [1 ,2 ]
Gou, Ruoyun [1 ]
Shang, Jili [1 ]
Shen, Fangyao [1 ]
Wu, Yifan [1 ,3 ]
Dai, Guojun [1 ]
机构
[1] HangZhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Key Lab Network Multimedia Technol Zhejiang Prov, Hangzhou, Peoples R China
[3] HangzhouDianzi Univ, Key Lab Brain Machine Collaborat Intelligence Zhe, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
speech emotion recognition; deep convolutional neural network; attention mechanism; long short-term memory; deep neural network; FEATURES;
D O I
10.3389/fphys.2021.643202
中图分类号
Q4 [生理学];
学科分类号
071003 ;
摘要
Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
    Zheng, Li
    Li, Qiao
    Ban, Hua
    Liu, Shuhua
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
  • [32] ConvTimeNet: A Pre-trained Deep Convolutional Neural Network for Time Series Classification
    Kashiparekh, Kathan
    Narwariya, Jyoti
    Malhotra, Pankaj
    Vig, Lovekesh
    Shroff, Gautam
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [33] PRE-TRAINED DEEP NEURAL NETWORK USING SPARSE AUTOENCODERS AND SCATTERING WAVELET TRANSFORM FOR MUSICAL GENRE RECOGNITION
    Klec, Mariusz
    Korzinek, Danijel
    COMPUTER SCIENCE-AGH, 2015, 16 (02): : 133 - 144
  • [34] Emotion recognition using effective connectivity and pre-trained convolutional neural networks in EEG signals
    Bagherzadeh, Sara
    Maghooli, Keivan
    Shalbaf, Ahmad
    Maghsoudi, Arash
    COGNITIVE NEURODYNAMICS, 2022, 16 (05) : 1087 - 1106
  • [35] Emotion recognition using effective connectivity and pre-trained convolutional neural networks in EEG signals
    Sara Bagherzadeh
    Keivan Maghooli
    Ahmad Shalbaf
    Arash Maghsoudi
    Cognitive Neurodynamics, 2022, 16 : 1087 - 1106
  • [36] VKCS: a pre-trained deep network with attention mechanism to diagnose acute lymphoblastic leukemia
    Masoudi, Babak
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (12) : 18967 - 18983
  • [37] Attention gated tensor neural network architectures for speech emotion recognition
    Pandey, Sandeep Kumar
    Shekhawat, Hanumant Singh
    Prasanna, S. R. M.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [38] VKCS: a pre-trained deep network with attention mechanism to diagnose acute lymphoblastic leukemia
    Babak Masoudi
    Multimedia Tools and Applications, 2023, 82 : 18967 - 18983
  • [39] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
  • [40] Automated micro-plastic detection and classification using deep convolution neural network pre-trained models and transfer learning
    Devipriya, K.
    Tlija, Mehdi
    Kumar, Chanumolu Kiran
    Kumar, V. Chandra
    Jana, Subrata
    Jana, Chiranjibe
    AIP ADVANCES, 2025, 15 (02)