Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis

被引:0
|
作者
Zhang, Baozheng [1 ,2 ,3 ]
Yuan, Ziqi [2 ]
Xu, Hua [2 ,4 ]
Gao, Kai [3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] Samton Jiangxi Technol Dev Co Ltd, Nanchang 330036, Peoples R China
[3] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China
[4] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Robustness; Task analysis; Sentiment analysis; Semantics; Metalearning; Representation learning; Social networking (online); Crossmodal translation; image-text sentiment analysis; meta learning; robustness and reliability; CLASSIFICATION; NETWORK;
D O I
10.1109/TMM.2024.3405662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-Text Sentiment Analysis task has garnered increased attention in recent years due to the surge in user-generated content on social media platforms. Previous research efforts have made noteworthy progress by leveraging the affective concepts shared between vision and text modalities. However, emotional cues may reside exclusively within one of the prevailing modalities, owing to modality independent nature and the potential absence of certain modalities. In this study, we aim to emphasize the significance of modality-independent emotional behaviors, in addition to the modality-invariant behaviors. To achieve this, we propose a novel approach called Crossmodal Translation-Based Meta Weight Adaption (CTMWA). Specifically, our approach involves the construction of the crossmodal translation network, which serves as the encoder. This architecture captures the shared concepts between vision content and text, empowering the model to effectively handle scenarios where either the vision or textual modality is missing. Building upon the translation-based framework, we introduce the strategy of unimodal weight adaption. Leveraging the meta-learning paradigm, our proposed strategy gradually learns to acquire unimodal weights for individual instances from a few hand-crafted meta instances with unimodal annotations. This enables us to modulate the gradients of each modality encoder based on the discrepancy between modalities during model training. Extensive experiments are conducted on three benchmark image-text sentiment analysis datasets, namely MVSA-Single, MVSA-Multiple, and TumEmo. The empirical results demonstrate that our proposed approach achieves the highest performance across all conventional image-text databases. Furthermore, experiments under modality missing settings and case study for reliable sentiment prediction are also conducted further exhibiting superior robustness as well as reliability of the propose approach.
引用
收藏
页码:9949 / 9961
页数:13
相关论文
共 50 条
  • [31] Multilevel relation analysis and mining method of image-text
    Guo R.
    Wang H.
    Wang D.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 684 - 694
  • [32] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
    Shang, Heng
    Zhao, Guoshuai
    Shi, Jing
    Qian, Xueming
    IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
  • [33] Text-based Person Search without Parallel Image-Text Data
    Bai, Yang
    Wang, Jingyao
    Cao, Min
    Chen, Chen
    Cao, Ziqiang
    Nie, Liqiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 757 - 767
  • [34] TEXT BASED SENTIMENT ANALYSIS
    Nandi, Biswarup
    Ghanti, Mousumi
    Paul, Souvik
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 9 - 13
  • [35] Research on Image-text Multimodal Emotions Analysis with Fused Emoji
    Bao, Guangbin
    Sun, Liangliang
    Zhang, Rui
    Zhang, Bo
    Shen, Zhiming
    Chen, Shuang
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 18 - 23
  • [36] TeFNA: Text-centered fusion network with crossmodal attention for multimodal sentiment analysis
    Huang, Changqin
    Zhang, Junling
    Wu, Xuemei
    Wang, Yi
    Li, Ming
    Huang, Xiaodi
    KNOWLEDGE-BASED SYSTEMS, 2023, 269
  • [37] Research on Sarcasm Detection Technology Based on Image-Text Fusion
    Jin, Xiaofang
    Yang, Yuying
    Wu, Yinan
    Xu, Ying
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (03): : 5225 - 5242
  • [38] Image-Text Matching Model Based on CLIP Bimodal Encoding
    Zhu, Yihuan
    Xu, Honghua
    Du, Ailin
    Wang, Bin
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [39] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
  • [40] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
    Dong, Xinfeng
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447