Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning

被引：0

作者：

Wang, Xiaofan ^{[1
]}

Li, Xiuhong ^{[1
]}

Li, Zhe ^{[2
,3
]}

Zhou, Chenyu ^{[1
]}

Chen, Fan ^{[1
]}

Yang, Dan ^{[1
]}

机构：

[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China

[2] Hong Kong Polytech Univ, Dept Elect & Elect Engn, Hong Kong, Peoples R China

[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024 | 2025年 / 15035卷

关键词：

Prompt learning; Multimodal Sentiment Analysis; Alignment;

D O I：

10.1007/978-981-97-8620-6_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic features. However, most previous MSA tasks have shown deficiencies in aligning features across different modalities. Experimental evidence shows that prompt learning can effectively align features, and previous studies have applied prompt learning to MSA tasks, but only in an unimodal context. Applying prompt learning to multimodal feature alignment remains a challenge. This paper employs a multimodal sentiment analysis model based on alignment prompts (MSAPL). Our model generates text and image alignment prompts via the Kronecker Product, enhancing visual modality engagement and the correlation between graphical and textual data, thus enabling a better understanding of multimodal data. Simultaneously, it employs a multi-layer, stepwise learning approach to acquire textual and image features, progressively modeling stage-feature relationships for rich contextual learning. Our experiments on three public datasets demonstrate that our model consistently outperforms all baseline models.

引用

页码：541 / 554

页数：14

共 50 条

[1] Hybrid cross-modal interaction learning for multimodal sentiment analysis
Fu, Yanping
Zhang, Zhiyuan
Yang, Ruidi
Yao, Cuiyou
NEUROCOMPUTING, 2024, 571
[2] Cross-Modal Modulating for Multimodal Sentiment Analysis
Cheng, Zichen
Li, Yan
Ge, Jiangwei
Jiu, Mengfei
Zhang, Jingwei
Computer Engineering and Applications, 2023, 59 (10) : 171 - 179
[3] Cross-modal contrastive learning for multimodal sentiment recognition
Yang, Shanliang
Cui, Lichao
Wang, Lei
Wang, Tao
APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
[4] Cross-modal contrastive learning for multimodal sentiment recognition
Shanliang Yang
Lichao Cui
Lei Wang
Tao Wang
Applied Intelligence, 2024, 54 : 4260 - 4276
[5] Mual: enhancing multimodal sentiment analysis with cross-modal attention and difference loss
Deng, Yang
Li, Yonghong
Xian, Sidong
Li, Laquan
Qiu, Haiyang
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
[6] Cross-Modal Enhancement Network for Multimodal Sentiment Analysis
Wang, Di
Liu, Shuai
Wang, Quan
Tian, Yumin
He, Lihuo
Gao, Xinbo
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4909 - 4921
[7] Prompt Learning with Cross-Modal Feature Alignment for Visual Domain Adaptation
Liu, Jinxing
Xiao, Junjin
Ma, Haokai
Li, Xiangxian
Qi, Zhuang
Meng, Xiangxu
Meng, Lei
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 416 - 428
[8] Enhancing Pulmonary Nodule Detection via Cross-Modal Alignment
Zhu, Yumeng
Xu, Yi
Ni, Bingbing
Zhang, Jie
Yang, Xiaokang
2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
[9] CMJRT: Cross-Modal Joint Representation Transformer for Multimodal Sentiment Analysis
Xu, Meng
Liang, Feifei
Su, Xiangyi
Fang, Cheng
IEEE ACCESS, 2022, 10 : 131671 - 131679
[10] Multimodal Sentiment Analysis Based on Cross-Modal Joint-Encoding
Sun, Bin
Jiang, Tao
Jia, Li
Cui, Yiming
Computer Engineering and Applications, 2024, 60 (18) : 208 - 216

← 1 2 3 4 5 →