Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning

被引:0
|
作者
Shang, Yanan [1 ]
Fu, Tianqi [1 ]
机构
[1] Cangzhou Normal Univ, Cangzhou 061001, Hebei, Peoples R China
来源
关键词
Multimodal fusion; Deep learning; Glove model; BiGRU; Emotion recognition; NEURAL-NETWORK;
D O I
10.1016/j.iswa.2024.200436
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of various human emotions holds significant value in numerous real-world scenarios. This paper focuses on the multimodal fusion of speech and text for emotion recognition. A 39-dimensional Mel-frequency cepstral coefficient (MFCC) was used as a feature for speech emotion. A 300-dimensional word vector obtained through the Glove algorithm was used as the feature for text emotion. The bidirectional gate recurrent unit (BiGRU) method in deep learning was added for extracting deep features. Subsequently, it was combined with the multi-head self-attention (MHA) mechanism and the improved sparrow search algorithm (ISSA) to obtain the ISSA-BiGRU-MHA method for emotion recognition. It was validated on the IEMOCAP and MELD datasets. It was found that MFCC and Glove word vectors exhibited superior recognition effects as features. Comparisons with the support vector machine and convolutional neural network methods revealed that the ISSA-BiGRU-MHA method demonstrated the highest weighted accuracy and unweighted accuracy. Multimodal fusion achieved weighted accuracies of 76.52 %, 71.84 %, 66.72 %, and 62.12 % on the IEMOCAP, MELD, MOSI, and MOSEI datasets, suggesting better performance than unimodal fusion. These results affirm the reliability of the multi- modal fusion recognition method, showing its practical applicability.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
    Lian, Hailun
    Lu, Cheng
    Li, Sunan
    Zhao, Yan
    Tang, Chuangao
    Zong, Yuan
    [J]. ENTROPY, 2023, 25 (10)
  • [2] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [3] Learning deep multimodal affective features for spontaneous speech emotion recognition
    Zhang, Shiqing
    Tao, Xin
    Chuang, Yuelong
    Zhao, Xiaoming
    [J]. SPEECH COMMUNICATION, 2021, 127 : 73 - 81
  • [4] Exploring Semantic Understanding and Generative Modeling in Speech-Text Multimodal Data Fusion
    Yu, Haitao
    Wang, Xuqiang
    Sun, Yifan
    Yang, Yifan
    Sun, Yan
    [J]. Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [5] MULTIMODAL SPEECH EMOTION RECOGNITION USING AUDIO AND TEXT
    Yoon, Seunghyun
    Byun, Seokhyun
    Jung, Kyomin
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 112 - 118
  • [6] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    [J]. Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [7] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [8] Polish Speech and Text Emotion Recognition in a Multimodal Emotion Analysis System
    Skowroński, Kamil
    Galuszka, Adam
    Probierz, Eryka
    [J]. Applied Sciences (Switzerland), 2024, 14 (22):
  • [9] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [10] Multimodal emotion recognition for the fusion of speech and EEG signals
    Ma J.
    Sun Y.
    Zhang X.
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (01): : 143 - 150