A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引：0

作者：

Vural, Elif ^{[1
]}

机构：

[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey

来源：

2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2020年

关键词：

Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.

引用

页数：4

共 50 条

[21] Multi-modal Representation Learning for Video Advertisement Content Structuring
Guo, Daya
Zeng, Zhaoyang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4770 - 4774
[22] Efficient disentangled representation learning for multi-modal finger biometrics
Yang, Weili
Huang, Junduan
Luo, Dacan
Kang, Wenxiong
PATTERN RECOGNITION, 2024, 145
[23] Learning Multi-Modal Word Representation Grounded in Visual Context
Zablocki, Eloi
Piwowarski, Benjamin
Soulier, Laure
Gallinari, Patrick
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5626 - 5633
[24] Multi-modal anchor adaptation learning for multi-modal summarization
Chen, Zhongfeng
Lu, Zhenyu
Rong, Huan
Zhao, Chuanjun
Xu, Fan
NEUROCOMPUTING, 2024, 570
[25] Generalization analysis of multi-modal metric learning
Lei, Yunwen
Ying, Yiming
ANALYSIS AND APPLICATIONS, 2016, 14 (04) : 503 - 521
[26] SSDMM-VAE: variational multi-modal disentangled representation learning
Mondal, Arnab Kumar
Sailopal, Ajay
Singla, Parag
Ap, Prathosh
APPLIED INTELLIGENCE, 2023, 53 (07) : 8467 - 8481
[27] Multi-Modal Representation Learning with Text-Driven Soft Masks
Park, Jaeyoo
Han, Bohyung
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2798 - 2807
[28] MMEarth: Exploring Multi-modal Pretext Tasks for Geospatial Representation Learning
Nedungadi, Vishal
Kariryaa, Ankit
Oehmcke, Stefan
Belongie, Serge
Igel, Christian
Lang, Nico
COMPUTER VISION - ECCV 2024, PT LXIV, 2025, 15122 : 164 - 182
[29] A Discriminant Information Theoretic Learning Framework for Multi-modal Feature Representation
Gao, Lei
Guan, Ling
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (03)
[30] Affective Interaction: Attentive Representation Learning for Multi-Modal Sentiment Classification
Zhang, Yazhou
Tiwari, Prayag
Rong, Lu
Chen, Rui
Alnajem, Nojoom A.
Hossain, M. Shamim
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)

← 1 2 3 4 5 →