Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis

被引：0

作者：

Zhang, Baozheng ^{[1
,2
,3
]}

Yuan, Ziqi ^{[2
]}

Xu, Hua ^{[2
,4
]}

Gao, Kai ^{[3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

[2] Samton Jiangxi Technol Dev Co Ltd, Nanchang 330036, Peoples R China

[3] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China

[4] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Robustness; Task analysis; Sentiment analysis; Semantics; Metalearning; Representation learning; Social networking (online); Crossmodal translation; image-text sentiment analysis; meta learning; robustness and reliability; CLASSIFICATION; NETWORK;

D O I：

10.1109/TMM.2024.3405662

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image-Text Sentiment Analysis task has garnered increased attention in recent years due to the surge in user-generated content on social media platforms. Previous research efforts have made noteworthy progress by leveraging the affective concepts shared between vision and text modalities. However, emotional cues may reside exclusively within one of the prevailing modalities, owing to modality independent nature and the potential absence of certain modalities. In this study, we aim to emphasize the significance of modality-independent emotional behaviors, in addition to the modality-invariant behaviors. To achieve this, we propose a novel approach called Crossmodal Translation-Based Meta Weight Adaption (CTMWA). Specifically, our approach involves the construction of the crossmodal translation network, which serves as the encoder. This architecture captures the shared concepts between vision content and text, empowering the model to effectively handle scenarios where either the vision or textual modality is missing. Building upon the translation-based framework, we introduce the strategy of unimodal weight adaption. Leveraging the meta-learning paradigm, our proposed strategy gradually learns to acquire unimodal weights for individual instances from a few hand-crafted meta instances with unimodal annotations. This enables us to modulate the gradients of each modality encoder based on the discrepancy between modalities during model training. Extensive experiments are conducted on three benchmark image-text sentiment analysis datasets, namely MVSA-Single, MVSA-Multiple, and TumEmo. The empirical results demonstrate that our proposed approach achieves the highest performance across all conventional image-text databases. Furthermore, experiments under modality missing settings and case study for reliable sentiment prediction are also conducted further exhibiting superior robustness as well as reliability of the propose approach.

引用

页码：9949 / 9961

页数：13

共 50 条

[1] Image-Text Fusion Sentiment Analysis Method Based on Image Semantic Translation
Huang, Jian
Wang, Ying
Computer Engineering and Applications, 2023, 59 (11) : 180 - 187
[2] Image-text interaction graph neural network for image-text sentiment analysis
Wenxiong Liao
Bi Zeng
Jianqi Liu
Pengfei Wei
Jiongkun Fang
Applied Intelligence, 2022, 52 : 11184 - 11198
[3] Image-text interaction graph neural network for image-text sentiment analysis
Liao, Wenxiong
Zeng, Bi
Liu, Jianqi
Wei, Pengfei
Fang, Jiongkun
APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
[4] BIT: Improving Image-text Sentiment Analysis via Learning Bidirectional Image-text Interaction
Xiao, Xingwang
Pu, Yuanyuan
Zhao, Zhengpeng
Gu, Jinjing
Xu, Dan
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[5] Multimodal Sentiment Analysis With Image-Text Interaction Network
Zhu, Tong
Li, Leida
Yang, Jufeng
Zhao, Sicheng
Liu, Hantao
Qian, Jiansheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3375 - 3385
[6] Multimodal Sentiment Analysis With Image-Text Correlation Modal
Li, Yuxin
Jiang, Shan
Chaomurilige
2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 281 - 286
[7] Image-text sentiment analysis based on hierarchical interaction fusion and contrast learning enhanced
Wang, Hongbing
Du, Qifei
Xiang, Yan
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 146
[8] Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis
Huang, Feiran
Wei, Kaimin
Weng, Jian
Li, Zhoujun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
[9] A TBGAV-Based Image-Text Multimodal Sentiment Analysis Method for Tourism Reviews
Zhang, Ke
Wang, Shunmin
Yu, Yuanyu
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2023, 18 (01) : 1 - 17
[10] Cross-Modal Sentiment Analysis Based on CLIP Image-Text Attention Interaction
Lu, Xintao
Ni, Yonglong
Ding, Zuohua
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (02) : 895 - 903

← 1 2 3 4 5 →