IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION

被引:8
|
作者
Meyer, Patrick [1 ]
Xu, Ziyi [1 ]
Fingscheidt, Tim [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, D-38106 Braunschweig, Germany
关键词
Speech emotion recognition; machine learning; log-mel spectrogram; BLSTM; FEATURES;
D O I
10.1109/SLT48900.2021.9383513
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has increased the interest in speech emotion recognition (SER) and has put forth diverse structures and methods to improve performance. In recent years it has turned out that applying SER on a (log-mel) spectrogram and thus, interpreting SER as an image recognition task is a promising method. Following the trend towards using a convolutional neural network (CNN) in combination with a bidirectional long short-term memory (BLSTM) layer, and some subsequent fully connected layers, in this work, we advance the performance of this topology by several contributions: We integrate a multi-kernel width CNN, propose a BLSTM output summarization function, apply an enhanced feature representation, and introduce an effective training method. In order to foster insight into our proposed methods, we separately evaluate the impact of each modification in an ablation study. Based on our modifications, we obtain top results for this type of topology on IEMOCAP with an unweighted average recall of 64.5% on average.
引用
收藏
页码:365 / 372
页数:8
相关论文
共 50 条
  • [1] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [2] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
  • [3] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
    Zhang, Zhihao
    Wang, Kunxia
    [J]. 2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [4] COMPACT CONVOLUTIONAL RECURRENT NEURAL NETWORKS VIA BINARIZATION FOR SPEECH EMOTION RECOGNITION
    Zhao, Huan
    Xiao, Yufeng
    Han, Jing
    Zhang, Zixing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6690 - 6694
  • [5] Continuous speech emotion recognition with convolutional neural networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    [J]. AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
  • [6] Continuous Speech Emotion Recognition with Convolutional Neural Networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
  • [7] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [8] Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition
    Jiang, Pengxu
    Xu, Xinzhou
    Tao, Huawei
    Zhao, Li
    Zou, Cairong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1564 - 1573
  • [9] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
    Mu, Yawei
    Gomez, Hernandez
    Cano Montes, Antonio
    Alcaraz Martinez, Carlos
    Wang, Xuetian
    Gao, Hongmin
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
  • [10] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444