Learning from the global view: Supervised contrastive learning of multimodal representation

被引:9
|
作者
Mai, Sijie [1 ]
Zeng, Ying [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R China
关键词
Multimodal sentiment analysis; Multimodal representation learning; Contrastive learning; Multimodal humor detection; FUSION;
D O I
10.1016/j.inffus.2023.101920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The development of technology enables the availability of abundant multimodal data, which can be utilized in many representation learning tasks. However, most methods ignore the rich modality correlation information stored in each multimodal object and fail to fully exploit the potential of multimodal data. To address the aforementioned issue, cross-modal contrastive learning methods are proposed to learn the similarity score of each modality pair in a self-/weakly-supervised manner and improve the model robustness. Though effective, contrastive learning based on unimodal representations might be, in some cases, inaccurate as unimodal representations fail to reveal the global information of multimodal objects. To this end, we propose a contrastive learning pipeline based on multimodal representations to learn from the global view, and devise multiple techniques to generate negative and positive samples for each anchor. To generate positive samples, we apply the mix-up operation to mix two multimodal representations of different objects that have the maximal label similarity. Moreover, we devise a permutation-invariant fusion mechanism to define the positive samples by permuting the input order of modalities for fusion and sampling various contrastive fusion networks. In this way, we force the multimodal representation to be invariant regarding the order of modalities and the structures of fusion networks, so that the model can capture high-level semantic information of multimodal objects. To define negative samples, for each modality, we randomly replace the unimodal representation with that from another dissimilar object when synthesizing the multimodal representation. By this means, the model is led to capture the high-level concurrence information and correspondence relationship between modalities within each object. We also directly define the multimodal representation from another object as a negative sample, where the chosen object shares the minimal label similarity with the anchor. The label information is leveraged in the proposed framework to learn a more discriminative multimodal embedding space for downstream tasks. Extensive experiments demonstrate that our method outperforms previous state-of-the-art baselines on the tasks of multimodal sentiment analysis and humor detection.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Supervised contrastive learning for recommendation
    Yang, Chun
    Zou, Jianxiao
    Wu, JianHua
    Xu, Hongbing
    Fan, Shicai
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [32] Adversarial supervised contrastive learning
    Li, Zhuorong
    Yu, Daiwei
    Wu, Minghui
    Jin, Canghong
    Yu, Hongchuan
    MACHINE LEARNING, 2023, 112 (06) : 2105 - 2130
  • [33] Motion Sensitive Contrastive Learning for Self-supervised Video Representation
    Ni, Jingcheng
    Zhou, Nan
    Qin, Jie
    Wu, Qian
    Liu, Junqi
    Li, Boxun
    Huang, Di
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 457 - 474
  • [34] Low-dimensional Representation of OCT Volumes with Supervised Contrastive Learning
    Marginean, Anca
    Bianca, Vesa
    Nicoara, Simona Delia
    Muntean, George
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING, ICCP, 2022, : 47 - 54
  • [35] Contrastive Self-Supervised Learning With Smoothed Representation for Remote Sensing
    Jung, Heechul
    Oh, Yoonju
    Jeong, Seongho
    Lee, Chaehyeon
    Jeon, Taegyun
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [36] Contrastive Self-supervised Representation Learning Using Synthetic Data
    Dong-Yu She
    Kun Xu
    International Journal of Automation and Computing, 2021, 18 (04) : 556 - 567
  • [37] Contrastive Self-supervised Representation Learning Using Synthetic Data
    Dong-Yu She
    Kun Xu
    International Journal of Automation and Computing, 2021, 18 : 556 - 567
  • [38] Contrastive Self-supervised Representation Learning Using Synthetic Data
    She, Dong-Yu
    Xu, Kun
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2021, 18 (04) : 556 - 567
  • [39] Self-supervised Segment Contrastive Learning for Medical Document Representation
    Abro, Waheed Ahmed
    Kteich, Hanane
    Bouraoui, Zied
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PT I, AIME 2024, 2024, 14844 : 312 - 321
  • [40] Adversarial supervised contrastive learning
    Zhuorong Li
    Daiwei Yu
    Minghui Wu
    Canghong Jin
    Hongchuan Yu
    Machine Learning, 2023, 112 : 2105 - 2130