Bimodal Fusion Network with Multi-Head Attention for Multimodal Sentiment Analysis

被引:3
|
作者
Zhang, Rui [1 ,2 ]
Xue, Chengrong [1 ,2 ]
Qi, Qingfu [3 ]
Lin, Liyuan [2 ]
Zhang, Jing [1 ,2 ]
Zhang, Lun [1 ,2 ]
机构
[1] Tianjin Sino German Univ Appl Sci, Sch Software & Commun, Tianjin 300222, Peoples R China
[2] Tianjin Univ Sci & Technol, Coll Elect Informat & Automation, Tianjin 300222, Peoples R China
[3] Gaussian Robot Pte Ltd, Tianjin 200100, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
关键词
multimodal sentiment analysis; bimodal fusion; multi-head attention; EMOTION RECOGNITION; FEATURES;
D O I
10.3390/app13031915
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The enrichment of social media expression makes multimodal sentiment analysis a research hotspot. However, modality heterogeneity brings great difficulties to effective cross-modal fusion, especially the modality alignment problem and the uncontrolled vector offset during fusion. In this paper, we propose a bimodal multi-head attention network (BMAN) based on text and audio, which adaptively captures the intramodal utterance features and complex intermodal alignment relationships. Specifically, we first set two independent unimodal encoders to extract the semantic features within each modality. Considering that different modalities deserve different weights, we further built a joint decoder to fuse the audio information into the text representation, based on learnable weights to avoid an unreasonable vector offset. The obtained cross-modal representation is used to improve the sentiment prediction performance. Experiments on both the aligned and unaligned CMU-MOSEI datasets show that our model achieves better performance than multiple baselines, and it has outstanding advantages in solving the problem of cross-modal alignment.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] The sentiment analysis model with multi-head self-attention and Tree-LSTM
    Li Lei
    Pei Yijian
    Jin Chenyang
    SIXTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2021, 11913
  • [32] Multi-Head multimodal deep interest recommendation network
    Yang, Mingbao
    Zhou, Peng
    Li, Shaobo
    Zhang, Yuanmeng
    Hu, Jianjun
    Zhang, Ansi
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [33] Multi-Level Attention Map Network for Multimodal Sentiment Analysis
    Xue, Xiaojun
    Zhang, Chunxia
    Niu, Zhendong
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 5105 - 5118
  • [34] A multimodal fusion network with attention mechanisms for visual-textual sentiment analysis
    Gan, Chenquan
    Fu, Xiang
    Feng, Qingdong
    Zhu, Qingyi
    Cao, Yang
    Zhu, Ye
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [35] Sentiment analysis of social media comments based on multimodal attention fusion network
    Liu, Ziyu
    Yang, Tao
    Chen, Wen
    Chen, Jiangchuan
    Li, Qinru
    Zhang, Jun
    APPLIED SOFT COMPUTING, 2024, 164
  • [36] Sentiment Analysis Using Multi-Head Attention Capsules With Multi-Channel CNN and Bidirectional GRU
    Cheng, Yan
    Sun, Huan
    Chen, Haomai
    Li, Meng
    Cai, Yingying
    Cai, Zhuang
    Huang, Jing
    IEEE ACCESS, 2021, 9 : 60383 - 60395
  • [37] Short Text Sentiment Analysis Based on Multi-Channel CNN With Multi-Head Attention Mechanism
    Feng, Yue
    Cheng, Yan
    IEEE ACCESS, 2021, 9 : 19854 - 19863
  • [38] Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network
    Ngoc-Huynh Ho
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    IEEE ACCESS, 2020, 8 : 61672 - 61686
  • [39] Retraction Note to: Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM
    K. Sangeetha
    D. Prabha
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (Suppl 1) : 537 - 537
  • [40] RETRACTED ARTICLE: Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM
    K. Sangeetha
    D. Prabha
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 4117 - 4126