SPEECH EMOTION RECOGNITION WITH DUAL-SEQUENCE LSTM ARCHITECTURE

被引:0
|
作者
Wang, Jianyou [1 ]
Xue, Michael [1 ]
Culhane, Ryan [1 ]
Diao, Enmao [1 ]
Ding, Jie [2 ]
Tarokh, Vahid [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
[2] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN USA
关键词
Speech Emotion Recognition; Mel-Spectrogram; LSTM; Dual-Sequence LSTM; Dual-Level Model;
D O I
10.1109/icassp40776.2020.9054629
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%-a 6% improvement over current state-of-the-art unimodal models-and is comparable with multimodal models that leverage textual information as well as audio signals.
引用
收藏
页码:6474 / 6478
页数:5
相关论文
共 50 条
  • [1] Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks
    Senthilkumar, N.
    Karpakam, S.
    Devi, M. Gayathri
    Balakumaresan, R.
    Dhilipkumar, P.
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 57 : 2180 - 2184
  • [2] Speech Emotion Recognition using Dual-Conv2D architecture
    Ayadi, Souha
    Lachiri, Zied
    [J]. PRZEGLAD ELEKTROTECHNICZNY, 2024, 100 (06): : 209 - 211
  • [3] A Combined CNN Architecture for Speech Emotion Recognition
    Begazo, Rolinson
    Aguilera, Ana
    Dongo, Irvin
    Cardinale, Yudith
    [J]. SENSORS, 2024, 24 (17)
  • [4] NEURAL ARCHITECTURE SEARCH FOR SPEECH EMOTION RECOGNITION
    Wu, Xixin
    Hu, Shoukang
    Wu, Zhiyong
    Liu, Xunying
    Meng, Helen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6902 - 6906
  • [5] COMPACT GRAPH ARCHITECTURE FOR SPEECH EMOTION RECOGNITION
    Shirian, Amir
    Guha, Tanaya
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6284 - 6288
  • [6] Speech Emotion Recognition using MFCC features and LSTM network
    Kumbhar, Harshawardhan S.
    Bhandari, Sheetal U.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [7] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [8] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [9] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [10] Emotion Recognition from Speech - an LSTM approach with the Tess Dataset
    Pandiammal, Sankara K.
    Karishma, S.
    Sakthe, Harine K.
    Manimaran, V
    Kalaiselvi, S.
    Anitha, V
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,