Research on cross-modal emotion recognition based on multi-layer semantic fusion

被引:0
|
作者
Xu Z. [1 ]
Gao Y. [1 ]
机构
[1] College of Information Engineering, Shanghai Maritime University, Shanghai
基金
中国国家自然科学基金;
关键词
cascade encoder; inter-modal information complementation; Mask-gated Fusion Networks (MGF-module); multimodal emotion recognition; multimodal fusion;
D O I
10.3934/mbe.2024110
中图分类号
学科分类号
摘要
Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper, we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model, which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. To achieve comprehensive and rich feature extraction from multimodal sources, considering different dimensions and depth levels, we design a parallel deep learning algorithm module that focuses on extracting features from individual modalities, ensuring cost-effective alignment of extracted features. Furthermore, a cascaded cross-modal encoder module based on Bidirectional Long Short-Term Memory (BILSTM) layer and Convolutional 1D (ConV1d) is introduced to facilitate intermodal information complementation. This module enables the seamless integration of information across modalities, effectively addressing the challenges associated with signal heterogeneity. To facilitate flexible and adaptive information selection and delivery, we design the Mask-gated Fusion Networks (MGF-module), which combines masking technology with gating structures. This approach allows for precise control over the information flow of each modality through gating vectors, mitigating issues related to low recognition accuracy and emotional misjudgment caused by complex features and noisy redundant information. The CM-MSF model underwent evaluation using the widely recognized multimodal emotion recognition datasets CMU-MOSI and CMU-MOSEI. The experimental findings illustrate the exceptional performance of the model, with binary classification accuracies of 89.1% and 88.6%, as well as F1 scores of 87.9% and 88.1% on the CMU-MOSI and CMU-MOSEI datasets, respectively. These results unequivocally validate the effectiveness of our approach in accurately recognizing and classifying emotions. ©2024 the Author(s), licensee AIMS Press.
引用
收藏
页码:2488 / 2514
页数:26
相关论文
共 50 条
  • [11] Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
    Liang, Jingjun
    Li, Ruichen
    Jin, Qin
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2852 - 2861
  • [12] Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
    Li, Ruichen
    Zhao, Jinming
    Jin, Qin
    INTERSPEECH 2021, 2021, : 4488 - 4492
  • [13] Cross-modal domain generalization semantic segmentation based on fusion features
    Yue, Wanlin
    Zhou, Zhiheng
    Cao, Yinglie
    Liuman
    KNOWLEDGE-BASED SYSTEMS, 2024, 302
  • [14] Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition
    Wang, Yongjin
    Guan, Ling
    Venetsanopoulos, Anastasios N.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (03) : 597 - 607
  • [15] A Cross-Modal Semantic Alignment and Feature Fusion Method for Bionic Drone and Bird Recognition
    Liu, Hehao
    Li, Dong
    Zhang, Ming
    Wan, Jun
    Liu, Shuang
    Zhu, Hanying
    Liu, Qinghua
    REMOTE SENSING, 2024, 16 (17)
  • [16] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [17] Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
    Zhu, Zihao
    Yu, Jing
    Wang, Yujing
    Sun, Yajing
    Hu, Yue
    Wu, Qi
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1097 - 1103
  • [18] CFN-ESA: A Cross-Modal Fusion Network With Emotion-Shift Awareness for Dialogue Emotion Recognition
    Li J.
    Wang X.
    Liu Y.
    Zeng Z.
    IEEE Transactions on Affective Computing, 2024, 15 (04): : 1 - 16
  • [19] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
    Gong, Peizhu
    Liu, Jin
    Wu, Zhongdai
    Han, Bing
    Wang, Y. Ken
    He, Huihua
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220
  • [20] Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
    Chen, Lijiang
    Ren, Jie
    Mao, Xia
    Zhao, Qi
    APPLIED SCIENCES-BASEL, 2022, 12 (09):