Research on cross-modal emotion recognition based on multi-layer semantic fusion

被引:0
|
作者
Xu Z. [1 ]
Gao Y. [1 ]
机构
[1] College of Information Engineering, Shanghai Maritime University, Shanghai
基金
中国国家自然科学基金;
关键词
cascade encoder; inter-modal information complementation; Mask-gated Fusion Networks (MGF-module); multimodal emotion recognition; multimodal fusion;
D O I
10.3934/mbe.2024110
中图分类号
学科分类号
摘要
Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper, we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model, which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. To achieve comprehensive and rich feature extraction from multimodal sources, considering different dimensions and depth levels, we design a parallel deep learning algorithm module that focuses on extracting features from individual modalities, ensuring cost-effective alignment of extracted features. Furthermore, a cascaded cross-modal encoder module based on Bidirectional Long Short-Term Memory (BILSTM) layer and Convolutional 1D (ConV1d) is introduced to facilitate intermodal information complementation. This module enables the seamless integration of information across modalities, effectively addressing the challenges associated with signal heterogeneity. To facilitate flexible and adaptive information selection and delivery, we design the Mask-gated Fusion Networks (MGF-module), which combines masking technology with gating structures. This approach allows for precise control over the information flow of each modality through gating vectors, mitigating issues related to low recognition accuracy and emotional misjudgment caused by complex features and noisy redundant information. The CM-MSF model underwent evaluation using the widely recognized multimodal emotion recognition datasets CMU-MOSI and CMU-MOSEI. The experimental findings illustrate the exceptional performance of the model, with binary classification accuracies of 89.1% and 88.6%, as well as F1 scores of 87.9% and 88.1% on the CMU-MOSI and CMU-MOSEI datasets, respectively. These results unequivocally validate the effectiveness of our approach in accurately recognizing and classifying emotions. ©2024 the Author(s), licensee AIMS Press.
引用
收藏
页码:2488 / 2514
页数:26
相关论文
共 50 条
  • [31] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
    Xu, Chao
    Cao, Tianyi
    Feng, Zhiyong
    Dong, Caichao
    CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550
  • [32] Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition
    Wan, Xin
    Wang, Yongxiong
    Wang, Zhe
    Tang, Yiheng
    Liu, Benke
    PHYSIOLOGICAL MEASUREMENT, 2024, 45 (07)
  • [33] Cross-modal semantic priming
    Tabossi, P
    LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
  • [34] Label graph learning for multi-label image recognition with cross-modal fusion
    Xie, Yanzhao
    Wang, Yangtao
    Liu, Yu
    Zhou, Ke
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 25363 - 25381
  • [35] Label graph learning for multi-label image recognition with cross-modal fusion
    Yanzhao Xie
    Yangtao Wang
    Yu Liu
    Ke Zhou
    Multimedia Tools and Applications, 2022, 81 : 25363 - 25381
  • [36] Cross-Modal Semantic Communications
    Li, Ang
    Wei, Xin
    Wu, Dan
    Zhou, Liang
    IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
  • [37] A Sign Language Recognition Framework Based on Cross-Modal Complementary Information Fusion
    Zhang, Jiangtao
    Wang, Qingshan
    Wang, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8131 - 8144
  • [38] Cross-modal video retrieval algorithm based on multi-semantic clues
    Ding L.
    Li Y.
    Yu C.
    Liu Y.
    Wang X.
    Qi S.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 596 - 604
  • [39] Multi-attention based semantic deep hashing for cross-modal retrieval
    Zhu, Liping
    Tian, Gangyi
    Wang, Bingyao
    Wang, Wenjie
    Zhang, Di
    Li, Chengyang
    APPLIED INTELLIGENCE, 2021, 51 (08) : 5927 - 5939
  • [40] Multi-attention based semantic deep hashing for cross-modal retrieval
    Liping Zhu
    Gangyi Tian
    Bingyao Wang
    Wenjie Wang
    Di Zhang
    Chengyang Li
    Applied Intelligence, 2021, 51 : 5927 - 5939