ADIEU FEATURES? END-TO-END SPEECH EMOTION RECOGNITION USING A DEEP CONVOLUTIONAL RECURRENT NETWORK

被引:0
|
作者
Trigeorgis, George [1 ]
Ringeval, Fabien [2 ,3 ]
Brueckner, Raymond [3 ,4 ]
Marchi, Erik [3 ]
Nicolaou, Mihalis A. [5 ]
Shuller, Bjoern [1 ,2 ,6 ]
Zafeiriou, Stefanos [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Passau, Chair Complex & Intelligent Syst, Passau, Germany
[3] Tech Univ Munich, MMK, Machine Intelligence & Signal Proc Grp, Munich, Germany
[4] Nuance Commun Deutschland GmbH, Ulm, Germany
[5] Goldsmiths Univ London, Dept Comp, London, England
[6] audEERING UG, Gilching, Germany
关键词
end-to-end learning; raw waveform; emotion recognition; deep learning; CNN; LSTM; NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The automatic recognition of spontaneous emotions from speech is a challenging task. On the one hand, acoustic features need to be robust enough to capture the emotional content for various styles of speaking, and while on the other, machine learning algorithms need to be insensitive to outliers while being able to model the context. Whereas the latter has been tackled by the use of Long Short-Term Memory (LSTM) networks, the former is still under very active investigations, even though more than a decade of research has provided a large set of acoustic descriptors. In this paper, we propose a solution to the problem of 'context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation. In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing techniques for the prediction of spontaneous and natural emotions on the RECOLA database.
引用
收藏
页码:5200 / 5204
页数:5
相关论文
共 50 条
  • [1] Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network
    Tang, Duowei
    Kuppens, Peter
    Geurts, Luc
    van Waterschoot, Toon
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 356 - 360
  • [2] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
    Tzirakis, Panagiotis
    Zhang, Jiehao
    Schuller, Bjoern W.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
  • [3] Squeeze-and-excitation 3D convolutional attention recurrent network for end-to-end speech emotion recognition
    Saleem, Nasir
    Elmannai, Hela
    Bourouis, Sami
    Trigui, Aymen
    [J]. APPLIED SOFT COMPUTING, 2024, 161
  • [4] VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Zhang, Yu
    Chan, William
    Jaitly, Navdeep
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4845 - 4849
  • [5] End-to-End Speech Emotion Recognition Based on One-Dimensional Convolutional Neural Network
    Gao, Mengna
    Dong, Jing
    Zhou, Dongsheng
    Zhang, Qiang
    Yang, Deyun
    [J]. 3RD INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2019), 2019, : 78 - 82
  • [6] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [7] End-to-end heart sound segmentation using deep convolutional recurrent network
    Chen, Yao
    Sun, Yanan
    Lv, Jiancheng
    Jia, Bijue
    Huang, Xiaoming
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 2103 - 2117
  • [8] End-to-end heart sound segmentation using deep convolutional recurrent network
    Yao Chen
    Yanan Sun
    Jiancheng Lv
    Bijue Jia
    Xiaoming Huang
    [J]. Complex & Intelligent Systems, 2021, 7 : 2103 - 2117
  • [9] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [10] End-to-End Parkinson's Disease Detection Using a Deep Convolutional Recurrent Network
    David Rios-Urrego, Cristian
    Andres Moreno-Acevedo, Santiago
    Noth, Elmar
    Rafael Orozco-Arroyave, Juan
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 326 - 338