Multi-modal emotion recognition using EEG and speech signals

被引:28
|
作者
Wang, Qian [1 ]
Wang, Mou [1 ]
Yang, Yan [1 ]
Zhang, Xiaolei [1 ]
机构
[1] Northwestern Polytech Univ, Xian 710072, Shaanxi, Peoples R China
关键词
Multi-modal emotion database; EEG emotion recognition; Speech emotion recognition; Physiological signal; Data fusion; IDENTIFICATION; DATABASE; MACHINE;
D O I
10.1016/j.compbiomed.2022.105907
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic Emotion Recognition (AER) is critical for naturalistic Human-Machine Interactions (HMI). Emotions can be detected through both external behaviors, e.g., tone of voice and internal physiological signals, e.g., electroencephalogram (EEG). In this paper, we first constructed a multi-modal emotion database, named Multi -modal Emotion Database with four modalities (MED4). MED4 consists of synchronously recorded signals of participants' EEG, photoplethysmography, speech and facial images when they were influenced by video stimuli designed to induce happy, sad, angry and neutral emotions. The experiment was performed with 32 participants in two environment conditions, a research lab with natural noises and an anechoic chamber. Four baseline algorithms were developed to verify the database and the performances of AER methods, Identification-vector + Probabilistic Linear Discriminant Analysis (I-vector + PLDA), Temporal Convolutional Network (TCN), Extreme Learning Machine (ELM) and Multi-Layer Perception Network (MLP). Furthermore, two fusion strategies on feature-level and decision-level respectively were designed to utilize both external and internal information of human status. The results showed that EEG signals generate higher accuracy in emotion recognition than that of speech signals (achieving 88.92% in anechoic room and 89.70% in natural noisy room vs 64.67% and 58.92% respectively). Fusion strategies that combine speech and EEG signals can improve overall accuracy of emotion recognition by 25.92% when compared to speech and 1.67% when compared to EEG in anechoic room and 31.74% and 0.96% in natural noisy room. Fusion methods also enhance the robustness of AER in the noisy environment. The MED4 database will be made publicly available, in order to encourage researchers all over the world to develop and validate various advanced methods for AER.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    [J]. 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [2] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [3] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
    Li, Zhongjie
    Zhang, Gaoyan
    Dang, Jianwu
    Wang, Longbiao
    Wei, Jianguo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Multi-modal Emotion Recognition using Speech Features and Text Embedding
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (01): : 108 - 113
  • [5] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [6] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    [J]. VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [7] Emotion recognition with multi-modal peripheral physiological signals
    Gohumpu, Jennifer
    Xue, Mengru
    Bao, Yanchi
    [J]. FRONTIERS IN COMPUTER SCIENCE, 2023, 5
  • [8] Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding
    Byun, Sung-Woo
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [9] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [10] Hidden Emotion Detection using Multi-modal Signals
    Kim, Dae Ha
    Song, Byung Cheol
    [J]. EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,