Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition

被引:10
|
作者
Jiang, Pengxu [1 ]
Xu, Xinzhou [2 ]
Tao, Huawei [3 ]
Zhao, Li [1 ]
Zou, Cairong [1 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 210023, Peoples R China
[3] Henan Univ Technol, Coll Informat Sci & Technol, Zhengzhou 450001, Peoples R China
关键词
Convolutional neural networks (CNNs); long short-term memory (LSTM); multiple attention mechanisms; speech emotion recognition (SER); FEATURES; REPRESENTATIONS; MODEL;
D O I
10.1109/TCDS.2021.3123979
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) aims to endow machines with the intelligence in perceiving latent affective components from speech. However, the existing works on deep-learning-based SER make it difficult to jointly consider time-frequency and sequential information in speech due to their structures, which may lead to deficiencies in exploring reasonable local emotional representations. In this regard, we propose a convolutional-recurrent neural network with multiple attention mechanisms (CRNN-MAs) for SER in this article, including the paralleled convolutional neural network (CNN) and long short-term memory (LSTM) modules, using extracted Mel-spectrums and frame-level features, respectively, in order to acquire time-frequency and sequential information simultaneously. Furthermore, we set three strategies for the proposed CRNN-MA: 1) a multiple self-attention layer in the CNN module on frame-level weights; 2) a multidimensional attention layer as the input features of the LSTM; and 3) a fusion layer summarizing the features of the two modules. Experimental results on three conventional SER corpora demonstrate the effectiveness of the proposed approach through using the convolutional-recurrent and multiple-attention modules, compared with other related models and existing state-of-the-art approaches.
引用
收藏
页码:1564 / 1573
页数:10
相关论文
共 50 条
  • [21] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
    Dossou, Bonaventure F. P.
    Gbenou, Yeno K. S.
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
  • [22] Multiple Convolutional Neural Networks in EEG Emotion Recognition
    Khairunissa, Hana Dwi
    Djamal, Esmeralda Contessa
    Wulandari, Arlisa
    [J]. 2021 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATICS ENGINEERING (IC2IE 2021), 2021, : 30 - 35
  • [23] Ensemble Learning With Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition
    Ai, Xusheng
    Sheng, Victor S.
    Fang, Wei
    Ling, Charles X.
    Li, Chunhua
    [J]. IEEE ACCESS, 2020, 8 : 199909 - 199919
  • [24] Multi-Channel 2-D Convolutional Recurrent Neural Networks for Speech Emotion Recognition
    Zhou, Weidong
    Zhou, Houpan
    Xia, Pengfei
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5884 - 5889
  • [25] Multiview Feature Fusion Attention Convolutional Recurrent Neural Networks for EEG-Based Emotion Recognition
    Xin, Ruihao
    Miao, Fengbo
    Cong, Ping
    Zhang, Fan
    Xin, Yongxian
    Feng, Xin
    [J]. JOURNAL OF SENSORS, 2023, 2023
  • [26] Multimodal Classification with Deep Convolutional-Recurrent Neural Networks for Electroencephalography
    Tan, Chuanqi
    Sun, Fuchun
    Zhang, Wenchang
    Chen, Jianhua
    Liu, Chunfang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 767 - 776
  • [27] Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition
    Heracleous, Panikos
    Mohammad, Yasser
    Yoneyama, Akio
    [J]. HUMAN-COMPUTER INTERACTION. RECOGNITION AND INTERACTION TECHNOLOGIES, HCI 2019, PT II, 2019, 11567 : 117 - 132
  • [28] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Niu, Yafeng
    Zou, Dongsheng
    Niu, Yadong
    He, Zhongshi
    Tan, Hua
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
  • [29] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    Lei, Peizhi
    Zhao, Li
    [J]. IEEE ACCESS, 2019, 7 : 90368 - 90377
  • [30] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
    Lee, Chan Woo
    Song, Kyu Ye
    Jeong, Jihoon
    Choi, Woo Yong
    [J]. FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34