Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning

被引:0
|
作者
Wang, Xiaofan [1 ]
Li, Xiuhong [1 ]
Li, Zhe [2 ,3 ]
Zhou, Chenyu [1 ]
Chen, Fan [1 ]
Yang, Dan [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Prompt learning; Multimodal Sentiment Analysis; Alignment;
D O I
10.1007/978-981-97-8620-6_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic features. However, most previous MSA tasks have shown deficiencies in aligning features across different modalities. Experimental evidence shows that prompt learning can effectively align features, and previous studies have applied prompt learning to MSA tasks, but only in an unimodal context. Applying prompt learning to multimodal feature alignment remains a challenge. This paper employs a multimodal sentiment analysis model based on alignment prompts (MSAPL). Our model generates text and image alignment prompts via the Kronecker Product, enhancing visual modality engagement and the correlation between graphical and textual data, thus enabling a better understanding of multimodal data. Simultaneously, it employs a multi-layer, stepwise learning approach to acquire textual and image features, progressively modeling stage-feature relationships for rich contextual learning. Our experiments on three public datasets demonstrate that our model consistently outperforms all baseline models.
引用
收藏
页码:541 / 554
页数:14
相关论文
共 50 条
  • [31] CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction
    Wang, Jie
    Yang, Yan
    Liu, Keyu
    Xie, Zhuyang
    Zhang, Fan
    Li, Tianrui
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [32] Knowledge graph embedding by fusing multimodal content via cross-modal learning
    Liu, Shi
    Li, Kaiyang
    Wang, Yaoying
    Zhu, Tianyou
    Li, Jiwei
    Chen, Zhenyu
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14180 - 14200
  • [33] Text-based person search via cross-modal alignment learning
    Ke, Xiao
    Liu, Hao
    Xu, Peirong
    Lin, Xinru
    Guo, Wenzhong
    PATTERN RECOGNITION, 2024, 152
  • [34] Differentiable Cross-modal Hashing via Multimodal Transformers
    Tu, Junfeng
    Liu, Xueliang
    Lin, Zongxiang
    Hong, Richang
    Wang, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [35] Cross-modal image sentiment analysis via deep correlation of textual semantic
    Zhang, Ke
    Zhu, Yunwen
    Zhang, Wenjun
    Zhu, Yonghua
    KNOWLEDGE-BASED SYSTEMS, 2021, 216
  • [36] Cross-Modal Translation and Alignment for Survival Analysis
    Zhou, Fengtao
    Chen, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21428 - 21437
  • [37] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Peng, Xi
    Goh, Rick Siow Mong
    Zhou, Joey Tianyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
  • [38] Category Alignment Adversarial Learning for Cross-Modal Retrieval
    He, Shiyuan
    Wang, Weiyang
    Wang, Zheng
    Xu, Xing
    Yang, Yang
    Wang, Xiaoming
    Shen, Heng Tao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
  • [39] Enhanced Multimodal Representation Learning with Cross-modal KD
    Chen, Mengxi
    Xing, Linyu
    Wang, Yu
    Zhang, Ya
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11766 - 11775
  • [40] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
    Hu, Peng
    Zhen, Liangli
    Peng, Dezhong
    Liu, Pei
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644