Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition

被引:126
|
作者
Li, Chao [1 ]
Bao, Zhongtian [1 ]
Li, Linhao [2 ,3 ]
Zhao, Ziping [1 ]
机构
[1] Tianjin Normal Univ, Coll Comp & Informat Engn, Tianjin 300387, Peoples R China
[2] Hebei Univ Technol, Sch Artificial Intellgence, Tianjin 300401, Peoples R China
[3] Hebei Univ Technol, Hebei Prov Key Lab Big Data Comp, Tianjin 300401, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; EEG signals; Physiological signals; Deep learning; Multimedia content; Multi-modal fusion; CLASSIFICATION; MODELS;
D O I
10.1016/j.ipm.2019.102185
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotional recognition contributes to automatically perceive the user's emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher's attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
    Zhao, Ziping
    Zheng, Yu
    Zhang, Zixing
    Wang, Haishuai
    Zhao, Yiqin
    Li, Chao
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 272 - 276
  • [2] Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN
    Liu, Jiamin
    Su, Yuanqi
    Liu, Yuehu
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 194 - 204
  • [3] AFLEMP: Attention-based Federated Learning for Emotion recognition using Multi-modal Physiological data
    Gahlan, Neha
    Sethia, Divyashikha
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94
  • [4] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [5] Attention-based Spatio-Temporal Graphic LSTM for EEG Emotion Recognition
    Li, Xiaoxu
    Zheng, Wenming
    Zong, Yuan
    Chang, Hongli
    Lu, Cheng
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [7] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [8] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [9] FACIAL EMOTION RECOGNITION USING LIGHT FIELD IMAGES WITH DEEP ATTENTION-BASED BIDIRECTIONAL LSTM
    Sepas-Moghaddam, Alireza
    Etemad, Ali
    Pereira, Fernando
    Correia, Paulo Lobato
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3367 - 3371
  • [10] Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
    Huddar, Mahesh G.
    Sannakki, Sanjeev S.
    Rajpurohit, Vijay S.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 112 - 121