Hybrid cross-modal interaction learning for multimodal sentiment analysis

被引:5
|
作者
Fu, Yanping [1 ]
Zhang, Zhiyuan [2 ]
Yang, Ruidi [1 ]
Yao, Cuiyou [1 ]
机构
[1] Capital Univ Econ & Business, Sch Management & Engn, Beijing 100070, Peoples R China
[2] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Key Lab Commun & Informat Syst, Beijing Municipal Commiss Educ, Beijing 100044, Peoples R China
关键词
Multimodal sentiment analysis; Cross-modal interaction; Contrastive learning; Modal gap;
D O I
10.1016/j.neucom.2023.127201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis (MSA) predicts the sentiment polarity of an unlabeled utterance that carries multiple modalities, such as text, vision and audio, by analyzing labeled utterances. Existing fusion methods mainly focus on establishing the relationship of characteristics among different modalities to enhance their emotion recognition abilities. However, they always ignore the all-round interaction between different modalities, especially the cross-modal interaction, which is critical to the sentiment decision of multimodal data. To address these issues, we propose a novel hybrid cross-modal interaction learning (HCIL) framework for hybrid learning of intra-modal, inter-modal, interactive-modal and cross-modal interactions, with which the model can fully utilize the sentiment information of multimodalities and enhance the sentiment assistance between modalities. Specifically, we propose two core substructures to learn discriminative multimodal features. One is the comparative learning interaction structure that can track the class dynamics in the intra-modal, reduce the modal gap in the inter-modal and establish emotional communication in the interactive-modal; the other is the cross-modal prediction structure, which can build the sentiment relationship between cross-modal pairs, especially exploring the auxiliary sentiment effect of audio on the vision and text. Furthermore, we adopt a hierarchical feature fusion structure to generate the multimodal feature for the final sentiment prediction. Extensive experiments on three benchmark datasets showed that our HCIL approach can obtain significant performance on the MSA task and that the design of a cross-modal interaction structure can directly promote the improvement of sentiment classification performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Cross-Modal Modulating for Multimodal Sentiment Analysis
    Cheng, Zichen
    Li, Yan
    Ge, Jiangwei
    Jiu, Mengfei
    Zhang, Jingwei
    [J]. Computer Engineering and Applications, 2023, 59 (10) : 171 - 179
  • [2] Cross-modal contrastive learning for multimodal sentiment recognition
    Yang, Shanliang
    Cui, Lichao
    Wang, Lei
    Wang, Tao
    [J]. APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
  • [3] Cross-modal contrastive learning for multimodal sentiment recognition
    Shanliang Yang
    Lichao Cui
    Lei Wang
    Tao Wang
    [J]. Applied Intelligence, 2024, 54 : 4260 - 4276
  • [4] Cross-Modal Enhancement Network for Multimodal Sentiment Analysis
    Wang, Di
    Liu, Shuai
    Wang, Quan
    Tian, Yumin
    He, Lihuo
    Gao, Xinbo
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4909 - 4921
  • [5] CMJRT: Cross-Modal Joint Representation Transformer for Multimodal Sentiment Analysis
    Xu, Meng
    Liang, Feifei
    Su, Xiangyi
    Fang, Cheng
    [J]. IEEE ACCESS, 2022, 10 : 131671 - 131679
  • [6] Multimodal Sentiment Analysis Based on Cross-Modal Joint-Encoding
    Sun, Bin
    Jiang, Tao
    Jia, Li
    Cui, Yiming
    [J]. Computer Engineering and Applications, 2024, 60 (18) : 208 - 216
  • [7] Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism
    Deng, Lujuan
    Liu, Boyi
    Li, Zuhe
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1157 - 1170
  • [8] The Weighted Cross-Modal Attention Mechanism With Sentiment Prediction Auxiliary Task for Multimodal Sentiment Analysis
    Chen, Qiupu
    Huang, Guimin
    Wang, Yabing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2689 - 2695
  • [9] Multimodal sentiment analysis model based on multi-task learning and stacked cross-modal Transformer
    Chen Q.-H.
    Sun J.-J.
    Lou Y.-B.
    Fang Z.-J.
    [J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (12): : 2421 - 2429
  • [10] Video-Based Cross-Modal Auxiliary Network for Multimodal Sentiment Analysis
    Chen, Rongfei
    Zhou, Wenju
    Li, Yang
    Zhou, Huiyu
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8703 - 8716