A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引:0
|
作者
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Learning Multi-Modal Word Representation Grounded in Visual Context
    Zablocki, Eloi
    Piwowarski, Benjamin
    Soulier, Laure
    Gallinari, Patrick
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5626 - 5633
  • [22] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    [J]. NEUROCOMPUTING, 2024, 570
  • [23] Generalization analysis of multi-modal metric learning
    Lei, Yunwen
    Ying, Yiming
    [J]. ANALYSIS AND APPLICATIONS, 2016, 14 (04) : 503 - 521
  • [24] Multi-Modal Representation Learning with Text-Driven Soft Masks
    Park, Jaeyoo
    Han, Bohyung
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2798 - 2807
  • [25] SSDMM-VAE: variational multi-modal disentangled representation learning
    Mondal, Arnab Kumar
    Sailopal, Ajay
    Singla, Parag
    Ap, Prathosh
    [J]. APPLIED INTELLIGENCE, 2023, 53 (07) : 8467 - 8481
  • [26] A Discriminant Information Theoretic Learning Framework for Multi-modal Feature Representation
    Gao, Lei
    Guan, Ling
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (03)
  • [27] Affective Interaction: Attentive Representation Learning for Multi-Modal Sentiment Classification
    Zhang, Yazhou
    Tiwari, Prayag
    Rong, Lu
    Chen, Rui
    Alnajem, Nojoom A.
    Hossain, M. Shamim
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)
  • [28] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Yun Xiao
    Yameng Huang
    Chenglong Li
    Lei Liu
    Aiwu Zhou
    Jin Tang
    [J]. Cognitive Computation, 2023, 15 : 1868 - 1883
  • [29] SSDMM-VAE: variational multi-modal disentangled representation learning
    Arnab Kumar Mondal
    Ajay Sailopal
    Parag Singla
    Prathosh AP
    [J]. Applied Intelligence, 2023, 53 : 8467 - 8481
  • [30] Incomplete multi-modal representation learning for Alzheimer's disease diagnosis
    Liu, Yanbei
    Fan, Lianxi
    Zhang, Changqing
    Zhou, Tao
    Xiao, Zhitao
    Geng, Lei
    Shen, Dinggang
    [J]. MEDICAL IMAGE ANALYSIS, 2021, 69