Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning

被引:1
|
作者
Shang, Yanan [1 ]
Fu, Tianqi [1 ]
机构
[1] Cangzhou Normal Univ, Cangzhou 061001, Hebei, Peoples R China
来源
INTELLIGENT SYSTEMS WITH APPLICATIONS | 2024年 / 24卷
关键词
Multimodal fusion; Deep learning; Glove model; BiGRU; Emotion recognition; NEURAL-NETWORK;
D O I
10.1016/j.iswa.2024.200436
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of various human emotions holds significant value in numerous real-world scenarios. This paper focuses on the multimodal fusion of speech and text for emotion recognition. A 39-dimensional Mel-frequency cepstral coefficient (MFCC) was used as a feature for speech emotion. A 300-dimensional word vector obtained through the Glove algorithm was used as the feature for text emotion. The bidirectional gate recurrent unit (BiGRU) method in deep learning was added for extracting deep features. Subsequently, it was combined with the multi-head self-attention (MHA) mechanism and the improved sparrow search algorithm (ISSA) to obtain the ISSA-BiGRU-MHA method for emotion recognition. It was validated on the IEMOCAP and MELD datasets. It was found that MFCC and Glove word vectors exhibited superior recognition effects as features. Comparisons with the support vector machine and convolutional neural network methods revealed that the ISSA-BiGRU-MHA method demonstrated the highest weighted accuracy and unweighted accuracy. Multimodal fusion achieved weighted accuracies of 76.52 %, 71.84 %, 66.72 %, and 62.12 % on the IEMOCAP, MELD, MOSI, and MOSEI datasets, suggesting better performance than unimodal fusion. These results affirm the reliability of the multi- modal fusion recognition method, showing its practical applicability.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Multimodal Emotion Recognition using Deep Learning Architectures
    Ranganathan, Hiranmayi
    Chakraborty, Shayok
    Panchanathan, Sethuraman
    2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
  • [42] Deep Learning-Based Emotion Recognition by Fusion of Facial Expressions and Speech Features
    Vardhan, Jasthi Vivek
    Chakravarti, Yelavarti Kalyan
    Chand, Annam Jitin
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
  • [43] Annotation Efficiency in Multimodal Emotion Recognition with Deep Learning
    Zhu, Lili
    Spachos, Petros
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 560 - 565
  • [44] MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition
    Qi, Xin
    Wen, Yujun
    Zhang, Pengzhou
    Huang, Heyan
    NEUROCOMPUTING, 2025, 611
  • [45] Learning Mutual Correlation in Multimodal Transformer for Speech Emotion Recognition
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Li, Jiahang
    Zhao, Zhengdao
    INTERSPEECH 2021, 2021, : 4518 - 4522
  • [46] Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition
    Liu, Wei
    Qiu, Jie-Lin
    Zheng, Wei-Long
    Lu, Bao-Liang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 715 - 729
  • [47] A Cross-Culture Study on Multimodal Emotion Recognition Using Deep Learning
    Gan, Lu
    Liu, Wei
    Luo, Yun
    Wu, Xun
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 670 - 680
  • [48] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
    Lee, Chan Woo
    Song, Kyu Ye
    Jeong, Jihoon
    Choi, Woo Yong
    FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34
  • [49] Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
    Lee, Yoonhyung
    Yoon, Seunghyun
    Jung, Kyomin
    INTERSPEECH 2020, 2020, : 2717 - 2721
  • [50] Ensemble deep learning with HuBERT for speech emotion recognition
    Yang, Janghoon
    2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 153 - 154