Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition

被引:10
|
作者
Jiang, Pengxu [1 ]
Xu, Xinzhou [2 ]
Tao, Huawei [3 ]
Zhao, Li [1 ]
Zou, Cairong [1 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 210023, Peoples R China
[3] Henan Univ Technol, Coll Informat Sci & Technol, Zhengzhou 450001, Peoples R China
关键词
Convolutional neural networks (CNNs); long short-term memory (LSTM); multiple attention mechanisms; speech emotion recognition (SER); FEATURES; REPRESENTATIONS; MODEL;
D O I
10.1109/TCDS.2021.3123979
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) aims to endow machines with the intelligence in perceiving latent affective components from speech. However, the existing works on deep-learning-based SER make it difficult to jointly consider time-frequency and sequential information in speech due to their structures, which may lead to deficiencies in exploring reasonable local emotional representations. In this regard, we propose a convolutional-recurrent neural network with multiple attention mechanisms (CRNN-MAs) for SER in this article, including the paralleled convolutional neural network (CNN) and long short-term memory (LSTM) modules, using extracted Mel-spectrums and frame-level features, respectively, in order to acquire time-frequency and sequential information simultaneously. Furthermore, we set three strategies for the proposed CRNN-MA: 1) a multiple self-attention layer in the CNN module on frame-level weights; 2) a multidimensional attention layer as the input features of the LSTM; and 3) a fusion layer summarizing the features of the two modules. Experimental results on three conventional SER corpora demonstrate the effectiveness of the proposed approach through using the convolutional-recurrent and multiple-attention modules, compared with other related models and existing state-of-the-art approaches.
引用
收藏
页码:1564 / 1573
页数:10
相关论文
共 50 条
  • [1] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
    Zhang, Zhihao
    Wang, Kunxia
    [J]. 2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [2] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
    Mu, Yawei
    Gomez, Hernandez
    Cano Montes, Antonio
    Alcaraz Martinez, Carlos
    Wang, Xuetian
    Gao, Hongmin
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
  • [3] Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition
    Pham, Nhat Truong
    Dang, Duc Ngoc Minh
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Nguyen, Hai
    Manavalan, Balachandran
    Lim, Chee Peng
    Nguyen, Sy Dzung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [4] EEG Emotion Recognition using Parallel Hybrid Convolutional-Recurrent Neural Networks
    Putri, Nursilva Aulianisa
    Djamal, Esmeralda Contessa
    Nugraha, Fikri
    Kasyidi, Fatan
    [J]. 2022 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ITS APPLICATIONS (ICODSA), 2022, : 24 - 29
  • [5] CONVOLUTIONAL-RECURRENT NEURAL NETWORKS FOR SPEECH ENHANCEMENT
    Zhao, Han
    Zarar, Shuayb
    Tashev, Ivan
    Lee, Chin-Hui
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2401 - 2405
  • [6] IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION
    Meyer, Patrick
    Xu, Ziyi
    Fingscheidt, Tim
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 365 - 372
  • [7] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [8] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [9] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    [J]. ELECTRONICS, 2023, 12 (20)
  • [10] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,