SSDMM-VAE: variational multi-modal disentangled representation learning

被引:0
|
作者
Arnab Kumar Mondal
Ajay Sailopal
Parag Singla
Prathosh AP
机构
[1] IIT Delhi,Amar Nath and Shashi Khosla School of Information Technology
[2] IIT Delhi,Department of Mathematics and Computing
[3] IIT Delhi,Department of Computer Science and Engineering
[4] Indian Institute of Science,Department of Electrical Communication Engineering
来源
Applied Intelligence | 2023年 / 53卷
关键词
Multimodal VAE; Disentangled representation learning;
D O I
暂无
中图分类号
学科分类号
摘要
Multi-modal learning aims at simultaneously modelling data from several modalities such as image, text and speech. The goal is to simultaneously learn representations and make them disentangled so that a variety of downstream tasks such as causal reasoning, fair ML and domain adaptation are well supported. In this work, we propose a novel semi-supervised method to learn disentangled representations for multi-modal data using variational inference. We incorporate a two-component latent space in a Variational Auto-Encoder (VAE) that comprises of domain-invariant (shared) and domain-specific (private) representations across modalities with partitioned discrete and continuous components. We combine the shared continuous and discrete latent spaces via Product-of-experts and statistical ensembles, respectively. We conduct several experiments on multiple multimodal datasets (dSprite-Text, Shaped3D-Text) to demonstrate the efficacy of the proposed method for learning disentangled representation. The proposed method achieves state-of-the-art FactorVAE Scores (0.93 and 1.00 respectively) surpassing the performance of various unimodal and multimodal baselines. Further, we demonstrate the benefits of learning disentangled joint representations on several downstream tasks (generation and classification) using MNIST-MADBase dataset with a joint coherence score of 96.95%. We demonstrate the use of variational inference for disentangled joint representation in a semi-supervised multimodal settings and its benefits in various downstream tasks.
引用
收藏
页码:8467 / 8481
页数:14
相关论文
共 50 条
  • [31] A Discriminant Information Theoretic Learning Framework for Multi-modal Feature Representation
    Gao, Lei
    Guan, Ling
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (03)
  • [32] Affective Interaction: Attentive Representation Learning for Multi-Modal Sentiment Classification
    Zhang, Yazhou
    Tiwari, Prayag
    Rong, Lu
    Chen, Rui
    Alnajem, Nojoom A.
    Hossain, M. Shamim
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)
  • [33] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Yun Xiao
    Yameng Huang
    Chenglong Li
    Lei Liu
    Aiwu Zhou
    Jin Tang
    [J]. Cognitive Computation, 2023, 15 : 1868 - 1883
  • [34] Incomplete multi-modal representation learning for Alzheimer's disease diagnosis
    Liu, Yanbei
    Fan, Lianxi
    Zhang, Changqing
    Zhou, Tao
    Xiao, Zhitao
    Geng, Lei
    Shen, Dinggang
    [J]. MEDICAL IMAGE ANALYSIS, 2021, 69
  • [35] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Xiao, Yun
    Huang, Yameng
    Li, Chenglong
    Liu, Lei
    Zhou, Aiwu
    Tang, Jin
    [J]. COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883
  • [36] Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
    Jiang, Qian
    Chen, Changyou
    Zhao, Han
    Chen, Liqun
    Ping, Qing
    Tran, Son Dinh
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7661 - 7671
  • [37] Deep Multi-modal Latent Representation Learning for Automated Dementia Diagnosis
    Zhou, Tao
    Liu, Mingxia
    Fu, Huazhu
    Wang, Jun
    Shen, Jianbing
    Shao, Ling
    Shen, Dinggang
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 629 - 638
  • [38] A variational approach to multi-modal image matching
    Chefd'Hotel, C
    Hermosillo, G
    Faugeras, O
    [J]. IEEE WORKSHOP ON VARIATIONAL AND LEVEL SET METHODS IN COMPUTER VISION, PROCEEDINGS, 2001, : 21 - 28
  • [39] CLMTR: a generic framework for contrastive multi-modal trajectory representation learning
    Liang, Anqi
    Yao, Bin
    Xie, Jiong
    Zheng, Wenli
    Shen, Yanyan
    Ge, Qiqi
    [J]. GEOINFORMATICA, 2024,
  • [40] Unsupervised Multi-modal Learning
    Iqbal, Mohammed Shameer
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE (AI 2015), 2015, 9091 : 343 - 346