Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

被引:55
|
作者
Yu, Yeonguk [1 ]
Kim, Yoon-Joong [1 ]
机构
[1] Hanbat Natl Univ, Dept Comp Engn, Daejeon 34158, South Korea
关键词
speech-emotion recognition; attention mechanism; LSTM;
D O I
10.3390/electronics9050713
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a speech-emotion recognition (SER) model with an "attention-long Long Short-Term Memory (LSTM)-attention" component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model's performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323
  • [22] Speech emotion recognition with embedded attention mechanism and hierarchical context
    Cheng Y.
    Chen Y.
    Chen Y.
    Yang Y.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2019, 51 (11): : 100 - 107
  • [23] EFFECTIVE ATTENTION MECHANISM IN DYNAMIC MODELS FOR SPEECH EMOTION RECOGNITION
    Hsiao, Po-Wei
    Chen, Chia-Ping
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2526 - 2530
  • [24] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
    Li, Shuzhen
    Xing, Xiaofen
    Fan, Weiquan
    Cai, Bolun
    Fordson, Perry
    Xu, Xiangmin
    Neurocomputing, 2021, 448 : 238 - 248
  • [25] Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
    Gao, Miao
    Yang, Chun
    Zhou, Fang
    Yin, Xu-cheng
    INTERSPEECH 2019, 2019, : 3930 - 3934
  • [26] Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion
    Xu, Mingke
    Zhang, Fan
    Khan, Samee U.
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 1058 - 1064
  • [28] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [29] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
    Li, Shuzhen
    Xing, Xiaofen
    Fan, Weiquan
    Cai, Bolun
    Fordson, Perry
    Xu, Xiangmin
    NEUROCOMPUTING, 2021, 448 : 238 - 248
  • [30] Auditory attention model based on Chirplet for cross-corpus speech emotion recognition
    Zhang X.
    Song P.
    Zha C.
    Tao H.
    Zhao L.
    Zhao, Li (zhaoli@seu.edu.cn), 1600, Southeast University (32): : 402 - 407