RNN with Improved Temporal Modeling for Speech Emotion Recognition

被引:0
|
作者
Lieskovska, Eva [1 ]
Jakubec, Maros [1 ]
Jarina, Roman [1 ]
机构
[1] Univ Zilina, Fac Elect Engn & Informat Technol, Zilina, Slovakia
关键词
speech emotion recognition; IEMOCAP; RNN; deep learning;
D O I
10.1109/RADIOELEKTRONIKA54537.2022.9764901
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Emotions are a natural part of daily human activities and play a key role in human decision-making, interactions, and cognitive processes. In recent years, an automatic speech emotion recognition has aroused great interest due to its wide potential use in environments where machines need more natural human interaction or monitoring (i.e., detection of stress levels in pilots). In this work, we focused on exploring new ways of temporal modeling of recurrent neural network (RNN) architectures. The proposed solution leverages contextual information from the speech sequence using recurrent model inspired by Siamese RNN. Compared to a baseline LSTM model, the proposed solution achieved a relative improvement of accuracy of approx. 5% on the IEMOCAP database.
引用
收藏
页码:5 / 9
页数:5
相关论文
共 50 条
  • [1] Emotion recognition of speech based on RNN
    Park, CH
    Lee, DW
    Sim, KB
    [J]. 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 2210 - 2213
  • [2] Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition
    Ntalampiras, Stavros
    Fakotakis, Nikos
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2012, 3 (01) : 116 - 125
  • [3] Hierarchical Modeling of Temporal Course in Emotional Expression for Speech Emotion Recognition
    Wu, Chung-Hsien
    Liang, Wei-Bin
    Cheng, Kuan-Chun
    Lin, Jen-Chun
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 810 - 814
  • [4] Emotion Recognition of Conversational Affective Speech Using Temporal Course Modeling
    Lin, Jen-Chun
    Wu, Chung-Hsien
    Wei, Wen-Li
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1335 - 1339
  • [5] Temporal Context in Speech Emotion Recognition
    Xia, Yangyang
    Chen, Li-Wei
    Rudnicky, Alexander
    Stern, Richard M.
    [J]. INTERSPEECH 2021, 2021, : 3370 - 3374
  • [6] Attention Layers for Temporal RNN Modeling of Speech Emotions
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    [J]. 2022 45TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING, TSP, 2022, : 40 - 44
  • [7] SPEECH EMOTION RECOGNITION WITH I-VECTOR FEATURE AND RNN MODEL
    Zhang, Teng
    Wu, Ji
    [J]. 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 524 - 528
  • [8] CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
    Chen, Chengxin
    Zhang, Pengyuan
    [J]. INTERSPEECH 2022, 2022, : 4730 - 4734
  • [9] Speech Emotion Recognition Based on Improved MFCC
    Wang, Yan
    Hu, Weiping
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
  • [10] IMPROVED ROBUSTNESS TO DISFLUENCIES IN RNN-TRANSDUCER BASED SPEECH RECOGNITION
    Mendelev, Valentin
    Raissi, Tina
    Camporese, Guglielmo
    Giollo, Manuel
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6878 - 6882