Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning

被引:0
|
作者
Wang, Xiaofan [1 ]
Li, Xiuhong [1 ]
Li, Zhe [2 ,3 ]
Zhou, Chenyu [1 ]
Chen, Fan [1 ]
Yang, Dan [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Prompt learning; Multimodal Sentiment Analysis; Alignment;
D O I
10.1007/978-981-97-8620-6_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic features. However, most previous MSA tasks have shown deficiencies in aligning features across different modalities. Experimental evidence shows that prompt learning can effectively align features, and previous studies have applied prompt learning to MSA tasks, but only in an unimodal context. Applying prompt learning to multimodal feature alignment remains a challenge. This paper employs a multimodal sentiment analysis model based on alignment prompts (MSAPL). Our model generates text and image alignment prompts via the Kronecker Product, enhancing visual modality engagement and the correlation between graphical and textual data, thus enabling a better understanding of multimodal data. Simultaneously, it employs a multi-layer, stepwise learning approach to acquire textual and image features, progressively modeling stage-feature relationships for rich contextual learning. Our experiments on three public datasets demonstrate that our model consistently outperforms all baseline models.
引用
收藏
页码:541 / 554
页数:14
相关论文
共 50 条
  • [41] Learning Relation Alignment for Calibrated Cross-modal Retrieval
    Ren, Shuhuai
    Lin, Junyang
    Zhao, Guangxiang
    Men, Rui
    Yang, An
    Zhou, Jingren
    Sun, Xu
    Yang, Hongxia
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 514 - 524
  • [42] Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling
    Yan, Xueming
    Xue, Haiwei
    Jiang, Shengyi
    Liu, Ziang
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [43] Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks
    Quan, Zhibang
    Sun, Tao
    Su, Mengli
    Wei, Jishu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [44] Multichannel Cross-Modal Fusion Network for Multimodal Sentiment Analysis Considering Language Information Enhancement
    Hu, Ronglong
    Yi, Jizheng
    Chen, Aibin
    Chen, Lijiang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) : 9814 - 9824
  • [45] Multimodal Sentiment Analysis Method Based on Cross-Modal Attention and Gated Unit Fusion Network
    Chen, Yansong
    Zhang, Le
    Zhang, Leihan
    Lü, Xueqiang
    Data Analysis and Knowledge Discovery, 2024, 8 (07) : 67 - 76
  • [46] Cross-Modal Transformer Combination Model for Sentiment Analysis
    Wang, Liang
    Wang, Yi
    Wang, Jun
    Computer Engineering and Applications, 2024, 60 (13) : 124 - 1350
  • [47] Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment
    Liang, Paul Pu
    Wu, Peter
    Liu Ziyin
    Morency, Louis-Philippe
    Salakhutdinov, Ruslan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2680 - 2689
  • [48] Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment
    Jiang, Xinyang
    Wu, Fei
    Li, Xi
    Zhao, Zhou
    Lu, Weiming
    Tang, Siliang
    Zhuang, Yueting
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 69 - 78
  • [49] Image sentiment analysis via active sample refinement and cross-modal semantics mining
    Zhang H.-B.
    Shi H.-W.
    Xiong Q.-P.
    Hou J.-Y.
    Kongzhi yu Juece/Control and Decision, 2022, 37 (11): : 2949 - 2958
  • [50] ROBUST LATENT REPRESENTATIONS VIA CROSS-MODAL TRANSLATION AND ALIGNMENT
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4315 - 4319