Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis

被引:0
|
作者
Zhang, Baozheng [1 ,2 ,3 ]
Yuan, Ziqi [2 ]
Xu, Hua [2 ,4 ]
Gao, Kai [3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] Samton Jiangxi Technol Dev Co Ltd, Nanchang 330036, Peoples R China
[3] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China
[4] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Robustness; Task analysis; Sentiment analysis; Semantics; Metalearning; Representation learning; Social networking (online); Crossmodal translation; image-text sentiment analysis; meta learning; robustness and reliability; CLASSIFICATION; NETWORK;
D O I
10.1109/TMM.2024.3405662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-Text Sentiment Analysis task has garnered increased attention in recent years due to the surge in user-generated content on social media platforms. Previous research efforts have made noteworthy progress by leveraging the affective concepts shared between vision and text modalities. However, emotional cues may reside exclusively within one of the prevailing modalities, owing to modality independent nature and the potential absence of certain modalities. In this study, we aim to emphasize the significance of modality-independent emotional behaviors, in addition to the modality-invariant behaviors. To achieve this, we propose a novel approach called Crossmodal Translation-Based Meta Weight Adaption (CTMWA). Specifically, our approach involves the construction of the crossmodal translation network, which serves as the encoder. This architecture captures the shared concepts between vision content and text, empowering the model to effectively handle scenarios where either the vision or textual modality is missing. Building upon the translation-based framework, we introduce the strategy of unimodal weight adaption. Leveraging the meta-learning paradigm, our proposed strategy gradually learns to acquire unimodal weights for individual instances from a few hand-crafted meta instances with unimodal annotations. This enables us to modulate the gradients of each modality encoder based on the discrepancy between modalities during model training. Extensive experiments are conducted on three benchmark image-text sentiment analysis datasets, namely MVSA-Single, MVSA-Multiple, and TumEmo. The empirical results demonstrate that our proposed approach achieves the highest performance across all conventional image-text databases. Furthermore, experiments under modality missing settings and case study for reliable sentiment prediction are also conducted further exhibiting superior robustness as well as reliability of the propose approach.
引用
收藏
页码:9949 / 9961
页数:13
相关论文
共 50 条
  • [21] Image-Text Sentiment Analysis Via Context Guided Adaptive Fine-Tuning Transformer
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhao, Zhengpeng
    Nie, Rencan
    Xu, Dan
    Qian, Wenhua
    Wu, Hao
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2103 - 2125
  • [22] Image-Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion
    Das, Ringki
    Singh, Thoudam Doren
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [23] Image-text sentiment analysis based on hierarchical interaction fusion and contrast learning enhanced (vol 146, 110262, 2025)
    Wang, Hongbin
    Du, Qifei
    Xiang, Yan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 151
  • [24] Neuron-Based Spiking Transmission and Reasoning Network for Robust Image-Text Retrieval
    Li, Wenrui
    Ma, Zhengyu
    Deng, Liang-Jian
    Fan, Xiaopeng
    Tian, Yonghong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3516 - 3528
  • [25] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [26] Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching
    Shi, Haitao
    Liu, Meng
    Mu, Xiaoxuan
    Song, Xuemeng
    Hu, Yupeng
    Nie, Liqiang
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (06)
  • [27] An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
    Gong, Lianting
    He, Xingzhou
    Yang, Jianzhong
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [28] Social Image-Text Sentiment Classification With Cross-Modal Consistency and Knowledge Distillation
    Liu, Huan
    Li, Ke
    Fan, Jianping
    Yan, Caixia
    Qin, Tao
    Zheng, Qinghua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 3332 - 3344
  • [29] Hashing based Efficient Inference for Image-Text Matching
    Tu, Rong-Cheng
    Ji, Lei
    Luo, Huaishao
    Shi, Botian
    Huang, Heyan
    Duan, Nan
    Mao, Xian-Ling
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 743 - 752
  • [30] IMAGE-TEXT ALIGNMENT AND RETRIEVAL USING LIGHT-WEIGHT TRANSFORMER
    Li, Wenrui
    Fan, Xiaopeng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4758 - 4762