Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning

被引:0
|
作者
Wang, Xiaofan [1 ]
Li, Xiuhong [1 ]
Li, Zhe [2 ,3 ]
Zhou, Chenyu [1 ]
Chen, Fan [1 ]
Yang, Dan [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Prompt learning; Multimodal Sentiment Analysis; Alignment;
D O I
10.1007/978-981-97-8620-6_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic features. However, most previous MSA tasks have shown deficiencies in aligning features across different modalities. Experimental evidence shows that prompt learning can effectively align features, and previous studies have applied prompt learning to MSA tasks, but only in an unimodal context. Applying prompt learning to multimodal feature alignment remains a challenge. This paper employs a multimodal sentiment analysis model based on alignment prompts (MSAPL). Our model generates text and image alignment prompts via the Kronecker Product, enhancing visual modality engagement and the correlation between graphical and textual data, thus enabling a better understanding of multimodal data. Simultaneously, it employs a multi-layer, stepwise learning approach to acquire textual and image features, progressively modeling stage-feature relationships for rich contextual learning. Our experiments on three public datasets demonstrate that our model consistently outperforms all baseline models.
引用
收藏
页码:541 / 554
页数:14
相关论文
共 50 条
  • [21] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [22] A Text-Centered Shared-Private Framework via Cross-Modal Prediction for Multimodal Sentiment Analysis
    Wu, Yang
    Lin, Zijie
    Zhao, Yanyan
    Qin, Bing
    Zhu, Li-Nan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4730 - 4738
  • [23] Prompt Learning for Multimodal Intent Recognition with Modal Alignment Perception
    Chen, Yuzhao
    Zhu, Wenhua
    Yu, Weilun
    Xue, Hongfei
    Fu, Hao
    Lin, Jiali
    Jiang, Dazhi
    COGNITIVE COMPUTATION, 2024, 16 (06) : 3417 - 3428
  • [24] Which is Making the Contribution: Modulating Unimodal and Cross-modal Dynamics for Multimodal Sentiment Analysis
    Zeng, Ying
    Mai, Sijie
    Hu, Haifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1262 - 1274
  • [25] Multimodal Sentiment Analysis Network Based on Distributional Transformation and Gated Cross-Modal Fusion
    Zhang, Yuchen
    Thong, Hong
    Chen, Guilin
    Alhusaini, Naji
    Zhao, Shenghui
    Wu, Cheng
    2024 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS, NANA 2024, 2024, : 496 - 503
  • [26] Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network
    Huang, Ju
    Lu, Pengtao
    Sun, Shuifa
    Wang, Fangyi
    ELECTRONICS, 2023, 12 (16)
  • [27] Video multimodal sentiment analysis using cross-modal feature translation and dynamical propagation
    Gan, Chenquan
    Tang, Yu
    Fu, Xiang
    Zhu, Qingyi
    Jain, Deepak Kumar
    Garcia, Salvador
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [28] Cross-modal dynamic sentiment annotation for speech sentiment analysis
    Chen, Jincai
    Sun, Chao
    Zhang, Sheng
    Zeng, Jiangfeng
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
  • [29] Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification
    Peng, Cheng
    Zhang, Chunxia
    Xue, Xiaojun
    Gao, Jiameng
    Liang, Hongjian
    Niu, Zhengdong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 664 - 679
  • [30] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification
    Cheng Peng
    Chunxia Zhang
    Xiaojun Xue
    Jiameng Gao
    Hongjian Liang
    Zhengdong Niu
    Tsinghua Science and Technology, 2022, 27 (04) : 664 - 679