Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning

被引:1
|
作者
Shang, Yanan [1 ]
Fu, Tianqi [1 ]
机构
[1] Cangzhou Normal Univ, Cangzhou 061001, Hebei, Peoples R China
来源
关键词
Multimodal fusion; Deep learning; Glove model; BiGRU; Emotion recognition; NEURAL-NETWORK;
D O I
10.1016/j.iswa.2024.200436
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of various human emotions holds significant value in numerous real-world scenarios. This paper focuses on the multimodal fusion of speech and text for emotion recognition. A 39-dimensional Mel-frequency cepstral coefficient (MFCC) was used as a feature for speech emotion. A 300-dimensional word vector obtained through the Glove algorithm was used as the feature for text emotion. The bidirectional gate recurrent unit (BiGRU) method in deep learning was added for extracting deep features. Subsequently, it was combined with the multi-head self-attention (MHA) mechanism and the improved sparrow search algorithm (ISSA) to obtain the ISSA-BiGRU-MHA method for emotion recognition. It was validated on the IEMOCAP and MELD datasets. It was found that MFCC and Glove word vectors exhibited superior recognition effects as features. Comparisons with the support vector machine and convolutional neural network methods revealed that the ISSA-BiGRU-MHA method demonstrated the highest weighted accuracy and unweighted accuracy. Multimodal fusion achieved weighted accuracies of 76.52 %, 71.84 %, 66.72 %, and 62.12 % on the IEMOCAP, MELD, MOSI, and MOSEI datasets, suggesting better performance than unimodal fusion. These results affirm the reliability of the multi- modal fusion recognition method, showing its practical applicability.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Overview of EmoSPeech at IberLEF 2024:Multimodal Speech-text Emotion Recognition in Spanish
    Pan, Ronghao
    Antonio Garcia-Diaz, Jose
    Angel Rondriguez-Garcia, Miguel
    Garcia-Sanchez, Francisco
    Valencia-Garcia, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2024, (73): : 359 - 368
  • [2] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
    Lian, Hailun
    Lu, Cheng
    Li, Sunan
    Zhao, Yan
    Tang, Chuangao
    Zong, Yuan
    ENTROPY, 2023, 25 (10)
  • [3] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [4] Exploring Semantic Understanding and Generative Modeling in Speech-Text Multimodal Data Fusion
    Yu, Haitao
    Wang, Xuqiang
    Sun, Yifan
    Yang, Yifan
    Sun, Yan
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [5] Learning deep multimodal affective features for spontaneous speech emotion recognition
    Zhang, Shiqing
    Tao, Xin
    Chuang, Yuelong
    Zhao, Xiaoming
    SPEECH COMMUNICATION, 2021, 127 : 73 - 81
  • [6] MULTIMODAL SPEECH EMOTION RECOGNITION USING AUDIO AND TEXT
    Yoon, Seunghyun
    Byun, Seokhyun
    Jung, Kyomin
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 112 - 118
  • [7] TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion
    Wei, Wei
    Zhang, Bingkun
    Wang, Yibing
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 454 - 467
  • [8] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [9] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [10] Polish Speech and Text Emotion Recognition in a Multimodal Emotion Analysis System
    Skowronski, Kamil
    Galuszka, Adam
    Probierz, Eryka
    APPLIED SCIENCES-BASEL, 2024, 14 (22):