ADIEU FEATURES? END-TO-END SPEECH EMOTION RECOGNITION USING A DEEP CONVOLUTIONAL RECURRENT NETWORK

被引:0
|
作者
Trigeorgis, George [1 ]
Ringeval, Fabien [2 ,3 ]
Brueckner, Raymond [3 ,4 ]
Marchi, Erik [3 ]
Nicolaou, Mihalis A. [5 ]
Shuller, Bjoern [1 ,2 ,6 ]
Zafeiriou, Stefanos [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Passau, Chair Complex & Intelligent Syst, Passau, Germany
[3] Tech Univ Munich, MMK, Machine Intelligence & Signal Proc Grp, Munich, Germany
[4] Nuance Commun Deutschland GmbH, Ulm, Germany
[5] Goldsmiths Univ London, Dept Comp, London, England
[6] audEERING UG, Gilching, Germany
关键词
end-to-end learning; raw waveform; emotion recognition; deep learning; CNN; LSTM; NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The automatic recognition of spontaneous emotions from speech is a challenging task. On the one hand, acoustic features need to be robust enough to capture the emotional content for various styles of speaking, and while on the other, machine learning algorithms need to be insensitive to outliers while being able to model the context. Whereas the latter has been tackled by the use of Long Short-Term Memory (LSTM) networks, the former is still under very active investigations, even though more than a decade of research has provided a large set of acoustic descriptors. In this paper, we propose a solution to the problem of 'context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation. In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing techniques for the prediction of spontaneous and natural emotions on the RECOLA database.
引用
收藏
页码:5200 / 5204
页数:5
相关论文
共 50 条
  • [21] DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition
    Zhang, Yuhao
    Hossain, Md Zakir
    Rahman, Shafin
    [J]. HUMAN-COMPUTER INTERACTION, INTERACT 2021, PT III, 2021, 12934 : 227 - 237
  • [22] CONVOLUTIONAL DROPOUT AND WORDPIECE AUGMENTATION FOR END-TO-END SPEECH RECOGNITION
    Xu, Hainan
    Huang, Yinghui
    Zhu, Yun
    Audhkhasi, Kartik
    Ramabhadran, Bhuvana
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5984 - 5988
  • [23] End-to-end attention convolutional recurrent network for online handwritten Chinese text recognition
    Qu, Xiwen
    Wu, Zhihong
    Huang, Jun
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62541 - 62558
  • [24] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [25] Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    [J]. ENGINEERING LETTERS, 2019, 27 (04) : 901 - 906
  • [26] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [27] EEG-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network
    Cui, Heng
    Liu, Aiping
    Zhang, Xu
    Chen, Xiang
    Wang, Kongqiao
    Chen, Xun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 205
  • [28] End-to-End Speech Command Recognition with Capsule Network
    Bae, Jaesung
    Kim, Dae-Shik
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 776 - 780
  • [29] End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network
    Tang, Duowei
    Kuppens, Peter
    Geurts, Luc
    van Waterschoot, Toon
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [30] End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network
    Duowei Tang
    Peter Kuppens
    Luc Geurts
    Toon van Waterschoot
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021