Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition

被引:19
|
作者
Zhang, Hua [1 ,2 ]
Gou, Ruoyun [1 ]
Shang, Jili [1 ]
Shen, Fangyao [1 ]
Wu, Yifan [1 ,3 ]
Dai, Guojun [1 ]
机构
[1] HangZhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Key Lab Network Multimedia Technol Zhejiang Prov, Hangzhou, Peoples R China
[3] HangzhouDianzi Univ, Key Lab Brain Machine Collaborat Intelligence Zhe, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
speech emotion recognition; deep convolutional neural network; attention mechanism; long short-term memory; deep neural network; FEATURES;
D O I
10.3389/fphys.2021.643202
中图分类号
Q4 [生理学];
学科分类号
071003 ;
摘要
Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [42] Teaming Up Pre-Trained Deep Neural Networks
    Deabes, Wael
    Abdel-Hakim, Alaa E.
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INFORMATION SECURITY (ICSPIS), 2018, : 73 - 76
  • [43] Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm
    Zvarevashe, Kudakwashe
    Olugbara, Oludayo O.
    INTELLIGENT DATA ANALYSIS, 2020, 24 (05) : 1065 - 1086
  • [44] A novel framework using binary attention mechanism based deep convolution neural network for face emotion recognition
    G R.P.
    K K.
    Measurement: Sensors, 2023, 30
  • [45] ON THE USE OF SELF-SUPERVISED PRE-TRAINED ACOUSTIC AND LINGUISTIC FEATURES FOR CONTINUOUS SPEECH EMOTION RECOGNITION
    Macary, Manon
    Tahon, Marie
    Esteve, Yannick
    Rousseau, Anthony
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 373 - 380
  • [46] GENERATING HUMAN READABLE TRANSCRIPT FOR AUTOMATIC SPEECH RECOGNITION WITH PRE-TRAINED LANGUAGE MODEL
    Liao, Junwei
    Shi, Yu
    Gong, Ming
    Shou, Linjun
    Eskimez, Sefik
    Lu, Liyang
    Qu, Hong
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7578 - 7582
  • [47] Automated algorithm selection using meta-learning and pre-trained deep convolution neural networks
    Dagan, Itai
    Vainshtein, Roman
    Katz, Gilad
    Rokach, Lior
    INFORMATION FUSION, 2024, 105
  • [48] A Deep Convolution Neural Network Model for Vehicle Recognition and Face Recognition
    Luo, Xingcheng
    Shen, Ruihan
    Hu, Jian
    Deng, Jianhua
    Hu, Linji
    Guan, Qing
    ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 : 715 - 720
  • [49] Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
    Wang, Kuan-Chen
    Li, You-Jin
    Chen, Wei-Lun
    Chen, Yu-Wen
    Wang, Yi-Ching
    Yeh, Ping-Cheng
    Zhang, Chao
    Tsao, Yu
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 426 - 430
  • [50] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,