Multiple attention convolutional-recurrent neural networks for speech emotion recognition

被引:0
|
作者
Zhang, Zhihao [1 ]
Wang, Kunxia [2 ]
机构
[1] Anhui Jianzhu Univ, Sch Elect & Informat Engn, Anhui Int Joint Res Ctr Ancient Architecture Inte, Hefei, Peoples R China
[2] Anhui Jianzhu Univ, Sch Elect & Informat Engn, Key Lab Architectural Acoust Environm Anhui Highe, Hefei, Peoples R China
关键词
Speech emotion recognition; Multiple attention mechanisms; Human-computer interaction;
D O I
10.1109/ACIIW57231.2022.10086021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition is of great significance in the research field of human-computer interaction and affective computing. One of the major challenges for SER now lies in how to explore effective emotional features from lengthy utterances. However, since most of existing deep-learning based SERs adopt Log-Mel spectrograms as the input model, it is unable to fully convey the emotional information in the speech. Furthermore, limited extraction ability of the model may make it difficult to extract key emotional representations. As a result, in order to address the above issues, we propose a new convolutional recurrent network based on multiple attention, including convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) modules, using extracted Melspectrums and Fourier Coefficient features respectively, which helps to complement the emotional information. Further, the multiple attention mechanisms in our model are as follows: Spatial attention and channel attention mechanisms are added to the CNN module to focus on the key emotional area and locate more effective features. Temporal attention gives weights to different time series segment features after BiLSTM extracts sequence information. Experimental results show that the model achieves the WA (weighted accuracy) of 87.9%, 76.5%, and 75.2% respectively while the UA (unweighted accuracy) stands at 87.6%, 73.5%, 70.1% respectively on EMODB, IEMOCAP, and EESDB speech datasets,which is better than most state-of-the-art methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition
    Jiang, Pengxu
    Xu, Xinzhou
    Tao, Huawei
    Zhao, Li
    Zou, Cairong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1564 - 1573
  • [2] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
    Mu, Yawei
    Gomez, Hernandez
    Cano Montes, Antonio
    Alcaraz Martinez, Carlos
    Wang, Xuetian
    Gao, Hongmin
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
  • [3] Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition
    Pham, Nhat Truong
    Dang, Duc Ngoc Minh
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Nguyen, Hai
    Manavalan, Balachandran
    Lim, Chee Peng
    Nguyen, Sy Dzung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [4] EEG Emotion Recognition using Parallel Hybrid Convolutional-Recurrent Neural Networks
    Putri, Nursilva Aulianisa
    Djamal, Esmeralda Contessa
    Nugraha, Fikri
    Kasyidi, Fatan
    [J]. 2022 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ITS APPLICATIONS (ICODSA), 2022, : 24 - 29
  • [5] CONVOLUTIONAL-RECURRENT NEURAL NETWORKS FOR SPEECH ENHANCEMENT
    Zhao, Han
    Zarar, Shuayb
    Tashev, Ivan
    Lee, Chin-Hui
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2401 - 2405
  • [6] IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION
    Meyer, Patrick
    Xu, Ziyi
    Fingscheidt, Tim
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 365 - 372
  • [7] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [8] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [9] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    [J]. ELECTRONICS, 2023, 12 (20)
  • [10] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,