Hybrid cross-modal interaction learning for multimodal sentiment analysis

被引:5
|
作者
Fu, Yanping [1 ]
Zhang, Zhiyuan [2 ]
Yang, Ruidi [1 ]
Yao, Cuiyou [1 ]
机构
[1] Capital Univ Econ & Business, Sch Management & Engn, Beijing 100070, Peoples R China
[2] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Key Lab Commun & Informat Syst, Beijing Municipal Commiss Educ, Beijing 100044, Peoples R China
关键词
Multimodal sentiment analysis; Cross-modal interaction; Contrastive learning; Modal gap;
D O I
10.1016/j.neucom.2023.127201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis (MSA) predicts the sentiment polarity of an unlabeled utterance that carries multiple modalities, such as text, vision and audio, by analyzing labeled utterances. Existing fusion methods mainly focus on establishing the relationship of characteristics among different modalities to enhance their emotion recognition abilities. However, they always ignore the all-round interaction between different modalities, especially the cross-modal interaction, which is critical to the sentiment decision of multimodal data. To address these issues, we propose a novel hybrid cross-modal interaction learning (HCIL) framework for hybrid learning of intra-modal, inter-modal, interactive-modal and cross-modal interactions, with which the model can fully utilize the sentiment information of multimodalities and enhance the sentiment assistance between modalities. Specifically, we propose two core substructures to learn discriminative multimodal features. One is the comparative learning interaction structure that can track the class dynamics in the intra-modal, reduce the modal gap in the inter-modal and establish emotional communication in the interactive-modal; the other is the cross-modal prediction structure, which can build the sentiment relationship between cross-modal pairs, especially exploring the auxiliary sentiment effect of audio on the vision and text. Furthermore, we adopt a hierarchical feature fusion structure to generate the multimodal feature for the final sentiment prediction. Extensive experiments on three benchmark datasets showed that our HCIL approach can obtain significant performance on the MSA task and that the design of a cross-modal interaction structure can directly promote the improvement of sentiment classification performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] A Text-Centered Shared-Private Framework via Cross-Modal Prediction for Multimodal Sentiment Analysis
    Wu, Yang
    Lin, Zijie
    Zhao, Yanyan
    Qin, Bing
    Zhu, Li-Nan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4730 - 4738
  • [42] Fine-grained sentiment Feature Extraction Method for Cross-modal Sentiment Analysis
    Sun, Ye
    Jin, Guozhe
    Zhao, Yahui
    Cui, Rongyi
    [J]. 2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 602 - 608
  • [43] KERNEL CROSS-MODAL FACTOR ANALYSIS FOR MULTIMODAL INFORMATION FUSION
    Wang, Yongjin
    Guan, Ling
    Venetsanopoulos, A. N.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2384 - 2387
  • [44] A novel cross-modal hashing algorithm based on multimodal deep learning
    Qu, Wen
    Wang, Daling
    Feng, Shi
    Zhang, Yifei
    Yu, Ge
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2017, 60 (09)
  • [45] A novel cross-modal hashing algorithm based on multimodal deep learning
    Wen QU
    Daling WANG
    Shi FENG
    Yifei ZHANG
    Ge YU
    [J]. Science China(Information Sciences), 2017, 60 (09) : 50 - 63
  • [46] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
    Ding, Meirong
    Chen, Hongye
    Zeng, Biqing
    [J]. Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
  • [47] Cross-modal context-gated convolution for multi-modal sentiment analysis
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    [J]. PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
  • [48] Cross-modal fine-grained alignment and fusion network for multimodal-based sentiment
    Xiao, Luwei
    Wu, Xingjiao
    Yang, Shuwen
    Xu, Junjie
    Zhou, Jie
    He, Liang
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [49] Cross-modal Common Representation Learning by Hybrid Transfer Network
    Huang, Xin
    Peng, Yuxin
    Yuan, Mingkuan
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1893 - 1900
  • [50] Learning cross-modal interaction for RGB-T tracking
    Xu, Chunyan
    Cui, Zhen
    Wang, Chaoqun
    Zhou, Chuanwei
    Yang, Jian
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (01)