SSDMM-VAE: variational multi-modal disentangled representation learning

被引:0
|
作者
Arnab Kumar Mondal
Ajay Sailopal
Parag Singla
Prathosh AP
机构
[1] IIT Delhi,Amar Nath and Shashi Khosla School of Information Technology
[2] IIT Delhi,Department of Mathematics and Computing
[3] IIT Delhi,Department of Computer Science and Engineering
[4] Indian Institute of Science,Department of Electrical Communication Engineering
来源
Applied Intelligence | 2023年 / 53卷
关键词
Multimodal VAE; Disentangled representation learning;
D O I
暂无
中图分类号
学科分类号
摘要
Multi-modal learning aims at simultaneously modelling data from several modalities such as image, text and speech. The goal is to simultaneously learn representations and make them disentangled so that a variety of downstream tasks such as causal reasoning, fair ML and domain adaptation are well supported. In this work, we propose a novel semi-supervised method to learn disentangled representations for multi-modal data using variational inference. We incorporate a two-component latent space in a Variational Auto-Encoder (VAE) that comprises of domain-invariant (shared) and domain-specific (private) representations across modalities with partitioned discrete and continuous components. We combine the shared continuous and discrete latent spaces via Product-of-experts and statistical ensembles, respectively. We conduct several experiments on multiple multimodal datasets (dSprite-Text, Shaped3D-Text) to demonstrate the efficacy of the proposed method for learning disentangled representation. The proposed method achieves state-of-the-art FactorVAE Scores (0.93 and 1.00 respectively) surpassing the performance of various unimodal and multimodal baselines. Further, we demonstrate the benefits of learning disentangled joint representations on several downstream tasks (generation and classification) using MNIST-MADBase dataset with a joint coherence score of 96.95%. We demonstrate the use of variational inference for disentangled joint representation in a semi-supervised multimodal settings and its benefits in various downstream tasks.
引用
收藏
页码:8467 / 8481
页数:14
相关论文
共 50 条
  • [1] SSDMM-VAE: variational multi-modal disentangled representation learning
    Mondal, Arnab Kumar
    Sailopal, Ajay
    Singla, Parag
    Ap, Prathosh
    [J]. APPLIED INTELLIGENCE, 2023, 53 (07) : 8467 - 8481
  • [2] Efficient disentangled representation learning for multi-modal finger biometrics
    Yang, Weili
    Huang, Junduan
    Luo, Dacan
    Kang, Wenxiong
    [J]. PATTERN RECOGNITION, 2024, 145
  • [3] DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
    Yao, Wenfang
    Yin, Kejing
    Cheung, William K.
    Liu, Jia
    Qin, Jing
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16416 - 16424
  • [4] Multi-modal Network Representation Learning
    Zhang, Chuxu
    Jiang, Meng
    Zhang, Xiangliang
    Ye, Yanfang
    Chawla, Nitesh, V
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558
  • [5] Multi-modal brain tumor segmentation via disentangled representation learning and region-aware contrastive learning
    Zhou, Tongxue
    [J]. PATTERN RECOGNITION, 2024, 149 (149)
  • [6] Mineral: Multi-modal Network Representation Learning
    Kefato, Zekarias T.
    Sheikh, Nasrullah
    Montresor, Alberto
    [J]. MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 286 - 298
  • [7] Bayesian mixture variational autoencoders for multi-modal learning
    Keng-Te Liao
    Bo-Wei Huang
    Chih-Chun Yang
    Shou-De Lin
    [J]. Machine Learning, 2022, 111 : 4329 - 4357
  • [8] Bayesian mixture variational autoencoders for multi-modal learning
    Liao, Keng-Te
    Huang, Bo-Wei
    Yang, Chih-Chun
    Lin, Shou-De
    [J]. MACHINE LEARNING, 2022, 111 (12) : 4329 - 4357
  • [9] Learning Disentangled User Representation Based on Controllable VAE for Recommendation
    Li, Yunyi
    Zhao, Pengpeng
    Wang, Deqing
    Xian, Xuefeng
    Liu, Yanchi
    Sheng, Victor S.
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT III, 2021, 12683 : 179 - 194
  • [10] Multi-modal Representation Learning for Successive POI Recommendation
    Li, Lishan
    Liu, Ying
    Wu, Jianping
    He, Lin
    Ren, Gang
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 441 - 456