Hybrid cross-modal interaction learning for multimodal sentiment analysis

被引：5

作者：

Fu, Yanping ^{[1
]}

Zhang, Zhiyuan ^{[2
]}

Yang, Ruidi ^{[1
]}

Yao, Cuiyou ^{[1
]}

机构：

[1] Capital Univ Econ & Business, Sch Management & Engn, Beijing 100070, Peoples R China

[2] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Key Lab Commun & Informat Syst, Beijing Municipal Commiss Educ, Beijing 100044, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 571卷

关键词：

Multimodal sentiment analysis; Cross-modal interaction; Contrastive learning; Modal gap;

D O I：

10.1016/j.neucom.2023.127201

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal sentiment analysis (MSA) predicts the sentiment polarity of an unlabeled utterance that carries multiple modalities, such as text, vision and audio, by analyzing labeled utterances. Existing fusion methods mainly focus on establishing the relationship of characteristics among different modalities to enhance their emotion recognition abilities. However, they always ignore the all-round interaction between different modalities, especially the cross-modal interaction, which is critical to the sentiment decision of multimodal data. To address these issues, we propose a novel hybrid cross-modal interaction learning (HCIL) framework for hybrid learning of intra-modal, inter-modal, interactive-modal and cross-modal interactions, with which the model can fully utilize the sentiment information of multimodalities and enhance the sentiment assistance between modalities. Specifically, we propose two core substructures to learn discriminative multimodal features. One is the comparative learning interaction structure that can track the class dynamics in the intra-modal, reduce the modal gap in the inter-modal and establish emotional communication in the interactive-modal; the other is the cross-modal prediction structure, which can build the sentiment relationship between cross-modal pairs, especially exploring the auxiliary sentiment effect of audio on the vision and text. Furthermore, we adopt a hierarchical feature fusion structure to generate the multimodal feature for the final sentiment prediction. Extensive experiments on three benchmark datasets showed that our HCIL approach can obtain significant performance on the MSA task and that the design of a cross-modal interaction structure can directly promote the improvement of sentiment classification performance.

引用

页数：15

共 50 条

[41] A Text-Centered Shared-Private Framework via Cross-Modal Prediction for Multimodal Sentiment Analysis
Wu, Yang
Lin, Zijie
Zhao, Yanyan
Qin, Bing
Zhu, Li-Nan
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4730 - 4738
[42] Fine-grained sentiment Feature Extraction Method for Cross-modal Sentiment Analysis
Sun, Ye
Jin, Guozhe
Zhao, Yahui
Cui, Rongyi
[J]. 2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 602 - 608
[43] KERNEL CROSS-MODAL FACTOR ANALYSIS FOR MULTIMODAL INFORMATION FUSION
Wang, Yongjin
Guan, Ling
Venetsanopoulos, A. N.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2384 - 2387
[44] A novel cross-modal hashing algorithm based on multimodal deep learning
Qu, Wen
Wang, Daling
Feng, Shi
Zhang, Yifei
Yu, Ge
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2017, 60 (09)
[45] A novel cross-modal hashing algorithm based on multimodal deep learning
Wen QU
Daling WANG
Shi FENG
Yifei ZHANG
Ge YU
[J]. Science China(Information Sciences), 2017, 60 (09) : 50 - 63
[46] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
Ding, Meirong
Chen, Hongye
Zeng, Biqing
[J]. Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
[47] Cross-modal context-gated convolution for multi-modal sentiment analysis
Wen, Huanglu
You, Shaodi
Fu, Ying
[J]. PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
[48] Cross-modal fine-grained alignment and fusion network for multimodal-based sentiment
Xiao, Luwei
Wu, Xingjiao
Yang, Shuwen
Xu, Junjie
Zhou, Jie
He, Liang
[J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
[49] Cross-modal Common Representation Learning by Hybrid Transfer Network
Huang, Xin
Peng, Yuxin
Yuan, Mingkuan
[J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1893 - 1900
[50] Learning cross-modal interaction for RGB-T tracking
Xu, Chunyan
Cui, Zhen
Wang, Chaoqun
Zhou, Chuanwei
Yang, Jian
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (01)

← 1 2 3 4 5 →