Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

被引:55
|
作者
Yu, Yeonguk [1 ]
Kim, Yoon-Joong [1 ]
机构
[1] Hanbat Natl Univ, Dept Comp Engn, Daejeon 34158, South Korea
关键词
speech-emotion recognition; attention mechanism; LSTM;
D O I
10.3390/electronics9050713
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a speech-emotion recognition (SER) model with an "attention-long Long Short-Term Memory (LSTM)-attention" component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model's performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [2] Hybrid LSTM-Attention and CNN Model for Enhanced Speech Emotion Recognition
    Makhmudov, Fazliddin
    Kutlimuratov, Alpamis
    Cho, Young-Im
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [3] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [4] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [5] Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition
    Jalal, Md Asif
    Milner, Rosanna
    Hain, Thomas
    INTERSPEECH 2020, 2020, : 4113 - 4117
  • [6] A Robust Framework for Speech Emotion Recognition Using Attention Based Convolutional Peephole LSTM
    Paramasivam, Ramya
    Lavanya, K.
    Divakarachari, Parameshachari Bidare
    Camacho, David
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2025,
  • [7] Sparse Graphic Attention LSTM for EEG Emotion Recognition
    Liu, Suyuan
    Zheng, Wenming
    Song, Tengfei
    Zong, Yuan
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 690 - 697
  • [8] Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition
    Atila, Orhan
    Sengur, Abdulkadir
    APPLIED ACOUSTICS, 2021, 182
  • [9] The Impact of Attention Mechanisms on Speech Emotion Recognition
    Chen, Shouyan
    Zhang, Mingyan
    Yang, Xiaofen
    Zhao, Zhijia
    Zou, Tao
    Sun, Xinqi
    SENSORS, 2021, 21 (22)
  • [10] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582