ADIEU FEATURES? END-TO-END SPEECH EMOTION RECOGNITION USING A DEEP CONVOLUTIONAL RECURRENT NETWORK

被引:0
|
作者
Trigeorgis, George [1 ]
Ringeval, Fabien [2 ,3 ]
Brueckner, Raymond [3 ,4 ]
Marchi, Erik [3 ]
Nicolaou, Mihalis A. [5 ]
Shuller, Bjoern [1 ,2 ,6 ]
Zafeiriou, Stefanos [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Passau, Chair Complex & Intelligent Syst, Passau, Germany
[3] Tech Univ Munich, MMK, Machine Intelligence & Signal Proc Grp, Munich, Germany
[4] Nuance Commun Deutschland GmbH, Ulm, Germany
[5] Goldsmiths Univ London, Dept Comp, London, England
[6] audEERING UG, Gilching, Germany
关键词
end-to-end learning; raw waveform; emotion recognition; deep learning; CNN; LSTM; NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The automatic recognition of spontaneous emotions from speech is a challenging task. On the one hand, acoustic features need to be robust enough to capture the emotional content for various styles of speaking, and while on the other, machine learning algorithms need to be insensitive to outliers while being able to model the context. Whereas the latter has been tackled by the use of Long Short-Term Memory (LSTM) networks, the former is still under very active investigations, even though more than a decade of research has provided a large set of acoustic descriptors. In this paper, we propose a solution to the problem of 'context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation. In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing techniques for the prediction of spontaneous and natural emotions on the RECOLA database.
引用
收藏
页码:5200 / 5204
页数:5
相关论文
共 50 条
  • [41] End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition
    Kumar, Puneet
    Jain, Sidharth
    Raman, Balasubramanian
    Roy, Partha Pratim
    Iwamura, Masakazu
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8766 - 8773
  • [42] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
  • [43] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    Lei, Peizhi
    Zhao, Li
    [J]. IEEE ACCESS, 2019, 7 : 90368 - 90377
  • [44] End-to-End Driving Activities and Secondary Tasks Recognition Using Deep Convolutional Neural Network and Transfer Learning
    Xing, Yang
    Tang, Jianlin
    Liu, Hong
    Lv, Chen
    Cao, Dongpu
    Velenis, Efstathios
    Wang, Fei-Yue
    [J]. 2018 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2018, : 1626 - 1631
  • [45] An End-to-End Deep Neural Network for Facial Emotion Classification
    Jalal, Md Asif
    Mihaylova, Lyudmila
    Moore, Roger K.
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [46] End-to-End Hardware Accelerator for Deep Convolutional Neural Network
    Chang, Tian-Sheuan
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2018,
  • [47] End-to-end music emotion variation detection using iteratively reconstructed deep features
    Orjesek, Richard
    Jarina, Roman
    Chmulik, Michal
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 5017 - 5031
  • [48] End-to-end music emotion variation detection using iteratively reconstructed deep features
    Richard Orjesek
    Roman Jarina
    Michal Chmulik
    [J]. Multimedia Tools and Applications, 2022, 81 : 5017 - 5031
  • [49] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
    Kimura, Naoki
    Su, Zixiong
    Saeki, Takaaki
    [J]. INTERSPEECH 2020, 2020, : 1025 - 1026
  • [50] Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
    Li, Yuanchao
    Zhao, Tianyu
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2019, 2019, : 2803 - 2807