Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition

被引:19
|
作者
Zhang, Hua [1 ,2 ]
Gou, Ruoyun [1 ]
Shang, Jili [1 ]
Shen, Fangyao [1 ]
Wu, Yifan [1 ,3 ]
Dai, Guojun [1 ]
机构
[1] HangZhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Key Lab Network Multimedia Technol Zhejiang Prov, Hangzhou, Peoples R China
[3] HangzhouDianzi Univ, Key Lab Brain Machine Collaborat Intelligence Zhe, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
speech emotion recognition; deep convolutional neural network; attention mechanism; long short-term memory; deep neural network; FEATURES;
D O I
10.3389/fphys.2021.643202
中图分类号
Q4 [生理学];
学科分类号
071003 ;
摘要
Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
    Huang, Che-Wei
    Narayanan, Shrikanth
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
  • [12] Automatic Speech Recognition Dataset Augmentation with Pre-Trained Model and Script
    Kwon, Minsu
    Choi, Ho-Jin
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 649 - 651
  • [13] Speech Emotion Recognition Based on Deep Neural Network
    Zhu, Zijiang
    Hu, Yi
    Li, Junshan
    Li, Jianjun
    Wang, Junhua
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
  • [14] Development of a deep learning network using a pre-trained convolutional neural network
    Rooney, M.
    Mitchell, J.
    McLaren, D. B.
    Nailon, W. H.
    RADIOTHERAPY AND ONCOLOGY, 2019, 133 : S1051 - S1052
  • [15] Modelling of Speech Parameters of Punjabi by Pre-trained Deep Neural Network Using Stacked Denoising Autoencoders
    Kaur, Navdeep
    Singh, Parminder
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [16] Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective
    Liu, Ke
    Wei, Jiwei
    Zou, Jie
    Wang, Peng
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10623 - 10636
  • [17] Convolution neural network with multiple pooling strategies for speech emotion recognition
    Jiang, Pengxu
    Zou, Cairong
    2022 6TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, ISCSIC, 2022, : 89 - 92
  • [18] Deep Convolution Neural Network Based Speech Recognition for Chhattisgarhi
    Londhe, Narendra D.
    Kshirsagar, Ghanahshyam B.
    Tekchandani, Hitesh
    2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 667 - 671
  • [19] Hockey activity recognition using pre-trained deep learning model
    Rangasamy, Keerthana
    As'ari, Muhammad Amir
    Rahmad, Nur Azmina
    Ghazali, Nurul Fathiah
    ICT EXPRESS, 2020, 6 (03): : 170 - 174
  • [20] PEPC: A Deep Parallel Convolutional Neural Network Model with Pre-trained Embeddings for DGA Detection
    Huang, Weiqing
    Zong, Yangyang
    Shi, Zhixin
    Wang, Leiqi
    Liu, Pengcheng
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,