A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引:0
|
作者
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Xiao, Yun
    Huang, Yameng
    Li, Chenglong
    Liu, Lei
    Zhou, Aiwu
    Tang, Jin
    [J]. COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883
  • [32] Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
    Jiang, Qian
    Chen, Changyou
    Zhao, Han
    Chen, Liqun
    Ping, Qing
    Tran, Son Dinh
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7661 - 7671
  • [33] Deep Multi-modal Latent Representation Learning for Automated Dementia Diagnosis
    Zhou, Tao
    Liu, Mingxia
    Fu, Huazhu
    Wang, Jun
    Shen, Jianbing
    Shao, Ling
    Shen, Dinggang
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 629 - 638
  • [34] CLMTR: a generic framework for contrastive multi-modal trajectory representation learning
    Liang, Anqi
    Yao, Bin
    Xie, Jiong
    Zheng, Wenli
    Shen, Yanyan
    Ge, Qiqi
    [J]. GEOINFORMATICA, 2024,
  • [35] Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis
    Zhang, Yazhou
    Yu, Yang
    Wang, Mengyao
    Huang, Min
    Hossain, M. Shamim
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [36] Unsupervised Multi-modal Learning
    Iqbal, Mohammed Shameer
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE (AI 2015), 2015, 9091 : 343 - 346
  • [37] Learning Multi-modal Similarity
    McFee, Brian
    Lanckriet, Gert
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 491 - 523
  • [38] Multi-modal learning for affective content analysis in movies
    Yun Yi
    Hanli Wang
    [J]. Multimedia Tools and Applications, 2019, 78 : 13331 - 13350
  • [39] Multi-modal learning for affective content analysis in movies
    Yi, Yun
    Wang, Hanli
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (10) : 13331 - 13350
  • [40] Towards a Theoretical Framework for Learning Multi-modal Patterns for Embodied Agents
    Noceti, Nicoletta
    Caputo, Barbara
    Castellini, Claudio
    Baldassarre, Luca
    Barla, Annalisa
    Rosasco, Lorenzo
    Odone, Francesca
    Sandini, Giulio
    [J]. IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 239 - +