A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引:0
|
作者
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Multi-modal Representation Learning for Video Advertisement Content Structuring
    Guo, Daya
    Zeng, Zhaoyang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4770 - 4774
  • [22] Efficient disentangled representation learning for multi-modal finger biometrics
    Yang, Weili
    Huang, Junduan
    Luo, Dacan
    Kang, Wenxiong
    PATTERN RECOGNITION, 2024, 145
  • [23] Learning Multi-Modal Word Representation Grounded in Visual Context
    Zablocki, Eloi
    Piwowarski, Benjamin
    Soulier, Laure
    Gallinari, Patrick
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5626 - 5633
  • [24] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    NEUROCOMPUTING, 2024, 570
  • [25] Generalization analysis of multi-modal metric learning
    Lei, Yunwen
    Ying, Yiming
    ANALYSIS AND APPLICATIONS, 2016, 14 (04) : 503 - 521
  • [26] SSDMM-VAE: variational multi-modal disentangled representation learning
    Mondal, Arnab Kumar
    Sailopal, Ajay
    Singla, Parag
    Ap, Prathosh
    APPLIED INTELLIGENCE, 2023, 53 (07) : 8467 - 8481
  • [27] Multi-Modal Representation Learning with Text-Driven Soft Masks
    Park, Jaeyoo
    Han, Bohyung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2798 - 2807
  • [28] MMEarth: Exploring Multi-modal Pretext Tasks for Geospatial Representation Learning
    Nedungadi, Vishal
    Kariryaa, Ankit
    Oehmcke, Stefan
    Belongie, Serge
    Igel, Christian
    Lang, Nico
    COMPUTER VISION - ECCV 2024, PT LXIV, 2025, 15122 : 164 - 182
  • [29] A Discriminant Information Theoretic Learning Framework for Multi-modal Feature Representation
    Gao, Lei
    Guan, Ling
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (03)
  • [30] Affective Interaction: Attentive Representation Learning for Multi-Modal Sentiment Classification
    Zhang, Yazhou
    Tiwari, Prayag
    Rong, Lu
    Chen, Rui
    Alnajem, Nojoom A.
    Hossain, M. Shamim
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)