A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引:0
|
作者
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Learning Multi-modal Similarity
    McFee, Brian
    Lanckriet, Gert
    JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 491 - 523
  • [42] Multi-modal learning for affective content analysis in movies
    Yun Yi
    Hanli Wang
    Multimedia Tools and Applications, 2019, 78 : 13331 - 13350
  • [43] Multi-modal learning for affective content analysis in movies
    Yi, Yun
    Wang, Hanli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (10) : 13331 - 13350
  • [44] Towards a Theoretical Framework for Learning Multi-modal Patterns for Embodied Agents
    Noceti, Nicoletta
    Caputo, Barbara
    Castellini, Claudio
    Baldassarre, Luca
    Barla, Annalisa
    Rosasco, Lorenzo
    Odone, Francesca
    Sandini, Giulio
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 239 - +
  • [45] DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
    Yao, Wenfang
    Yin, Kejing
    Cheung, William K.
    Liu, Jia
    Qin, Jing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16416 - 16424
  • [46] Multi-modal Alignment using Representation Codebook
    Duan, Jiali
    Chen, Liqun
    Tran, Son
    Yang, Jinyu
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15630 - 15639
  • [47] Deep multi-modal learning for joint linear representation of nonlinear dynamical systems
    Qian, Shaodi
    Chou, Chun-An
    Li, Jr-Shin
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [48] Multi-modal Relation Distillation for Unified 3D Representation Learning
    Wang, Huiqun
    Bao, Yiping
    Pan, Panwang
    Li, Zeming
    Liu, Xiao
    Yang, Ruijie
    Huang, Di
    COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 364 - 381
  • [49] Exploiting Multi-modal Fusion for Robust Face Representation Learning with Missing Modality
    Zhu, Yizhe
    Sun, Xin
    Zhou, Xi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 283 - 294
  • [50] Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs
    Wu, Tianyu
    Tang, Yang
    Sun, Qiyu
    Xiong, Luolin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 3044 - 3055