SPATIO-TEMPORAL CONTEXT MODELLING FOR SPEECH EMOTION CLASSIFICATION

被引:0
|
作者
Jalal, Md Asif [1 ]
Moore, Roger K. [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Res Grp SPandH, Sheffield, S Yorkshire, England
关键词
Emotion classification; SER; Deep Neural Networks; Convolutional Neural Network; Attention Network;
D O I
10.1109/asru46091.2019.9004037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is a requisite for emotional intelligence that affects the understanding of speech. One of the most crucial tasks is to obtain patterns having a maximum correlation for the emotion classification task from the speech signal while being invariant to the changes in frequency, time and other external distortions. Therefore, learning emotional contextual feature representation independent of speaker and environment is essential. In this paper, a novel spatiotemporal context modelling framework for robust SER is proposed to learn feature representation by using acoustic context expansion with high dimensional feature projection. The framework uses a deep convolutional neural network (CNN) and self-attention network. The CNNs combine spatiotemporal features. The attention network produces high dimensional task-specific features and combines these features for context modelling, which altogether provides a state-of-the-art technique for classifying the extracted patterns for speech emotion. Speech emotion is a categorical perception representing discrete sensory events. The proposed approach is compared with a wide range of architectures on the RAVDESS and IEMOCAP corpora for 8-class and 4-class emotion classification tasks and remarkable gain over state-of-the-art systems are obtained, absolutely 15%, 10% respectively.
引用
收藏
页码:853 / 859
页数:7
相关论文
共 50 条
  • [1] EPIC: Emotion Perception by Spatio-Temporal Interaction Context of Gait
    Lu, Haifeng
    Xu, Shihao
    Zhao, Shipeng
    Hu, Xiping
    Ma, Rong
    Hu, Bin
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (05) : 2592 - 2601
  • [2] Spatio-Temporal Context Modeling for BoW-Based Video Classification
    Yi, Saehoon
    Pavlovic, Vladimir
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 779 - 786
  • [3] Modelling spatio-temporal random fields
    Schmiegel, J
    Barndorff-Nielsen, OE
    Eggers, HC
    [J]. SOUTH AFRICAN JOURNAL OF SCIENCE, 2005, 101 (11-12) : 512 - 512
  • [4] Spatio-temporal stochastic modelling (METMAVI)
    Raquel Menezes
    A. Manuela Gonçalves
    [J]. Stochastic Environmental Research and Risk Assessment, 2014, 28 : 1167 - 1169
  • [5] Spatio-Temporal Modelling of Noise Pollution
    Napi, Nur Nazmi Liyana Mohd
    Zainal, Mohd Hafizul
    Abdullah, Samsuri
    Dom, Nazri Che
    Abu Mansor, Amalina
    Ahmed, Ali Najah
    Ismail, Marzuki
    [J]. INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2021, 13 (03): : 125 - 131
  • [6] Modelling spatio-temporal environmental data
    Rasinmäki, J
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2003, 18 (10) : 877 - 886
  • [7] Spatio-temporal stochastic modelling (METMAVI)
    Menezes, Raquel
    Manuela Goncalves, A.
    [J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2014, 28 (05) : 1167 - 1169
  • [8] Modelling spatio-temporal variability of temperature
    Xiaofeng Cao
    Ostap Okhrin
    Martin Odening
    Matthias Ritter
    [J]. Computational Statistics, 2015, 30 : 745 - 766
  • [9] Modelling of spatio-temporal variation of snowcover
    Schaumberger, Andreas
    Formayer, Herbert
    Tiefenbach, Priska
    Grillenberger, Joerg
    Strobl, Josef
    [J]. MITTEILUNGEN DER OSTERREICHISCHEN GEOGRAPHISCHEN GESELLSCHAFT, 2008, 150 : 163 - 182
  • [10] Modelling spatio-temporal variability of temperature
    Cao, Xiaofeng
    Okhrin, Ostap
    Odening, Martin
    Ritter, Matthias
    [J]. COMPUTATIONAL STATISTICS, 2015, 30 (03) : 745 - 766