Research on cross-modal emotion recognition based on multi-layer semantic fusion

被引:0
|
作者
Xu Z. [1 ]
Gao Y. [1 ]
机构
[1] College of Information Engineering, Shanghai Maritime University, Shanghai
基金
中国国家自然科学基金;
关键词
cascade encoder; inter-modal information complementation; Mask-gated Fusion Networks (MGF-module); multimodal emotion recognition; multimodal fusion;
D O I
10.3934/mbe.2024110
中图分类号
学科分类号
摘要
Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper, we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model, which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. To achieve comprehensive and rich feature extraction from multimodal sources, considering different dimensions and depth levels, we design a parallel deep learning algorithm module that focuses on extracting features from individual modalities, ensuring cost-effective alignment of extracted features. Furthermore, a cascaded cross-modal encoder module based on Bidirectional Long Short-Term Memory (BILSTM) layer and Convolutional 1D (ConV1d) is introduced to facilitate intermodal information complementation. This module enables the seamless integration of information across modalities, effectively addressing the challenges associated with signal heterogeneity. To facilitate flexible and adaptive information selection and delivery, we design the Mask-gated Fusion Networks (MGF-module), which combines masking technology with gating structures. This approach allows for precise control over the information flow of each modality through gating vectors, mitigating issues related to low recognition accuracy and emotional misjudgment caused by complex features and noisy redundant information. The CM-MSF model underwent evaluation using the widely recognized multimodal emotion recognition datasets CMU-MOSI and CMU-MOSEI. The experimental findings illustrate the exceptional performance of the model, with binary classification accuracies of 89.1% and 88.6%, as well as F1 scores of 87.9% and 88.1% on the CMU-MOSI and CMU-MOSEI datasets, respectively. These results unequivocally validate the effectiveness of our approach in accurately recognizing and classifying emotions. ©2024 the Author(s), licensee AIMS Press.
引用
收藏
页码:2488 / 2514
页数:26
相关论文
共 50 条
  • [41] Research on coal gangue recognition based on multi-layer time domain feature processing and recognition features cross-optimal fusion
    Yang, Yang
    Zhang, Yao
    Zeng, Qingliang
    MEASUREMENT, 2022, 204
  • [42] Attentive Cross-modal Connections for Deep Multimodal Wearable-based Emotion Recognition
    Bhatti, Anubhav
    Behinaein, Behnam
    Rodenburg, Dirk
    Hungler, Paul
    Etemad, Ali
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
  • [43] EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals
    Liu, Yucheng
    Jia, Ziyu
    Wang, Haichao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6122 - 6131
  • [44] Hierarchical Cross-Modal Interaction and Fusion Network Enhanced with Self-Distillation for Emotion Recognition in Conversations
    Wei, Puling
    Yang, Juan
    Xiao, Yali
    ELECTRONICS, 2024, 13 (13)
  • [45] Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
    Praveen, R. Gnana
    Alam, Jahangir
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 444 - 458
  • [46] Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition
    Li, Junbing
    Zhang, Changqing
    Wang, Xueman
    Du, Ling
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 736 - 747
  • [47] Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion
    Wang, Yangtao
    Xie, Yanzhao
    Liu, Yu
    Zhou, Ke
    Li, Xiaocui
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1575 - 1584
  • [48] WiVi: WiFi-Video Cross-Modal Fusion based Multi-Path Gait Recognition System
    Fan, Jinmeng
    Zhou, Hao
    Zhou, Fengyu
    Wang, Xiaoyan
    Liu, Zhi
    Li, Xiang-Yang
    2022 IEEE/ACM 30TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2022,
  • [49] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
  • [50] FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers' Emotion Recognition
    Bano, Saira
    Tonellotto, Nicola
    Cassara, Pietro
    Gotta, Alberto
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)