A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions

被引：0

作者：

Vural, Elif ^{[1
]}

机构：

[1] Orta Dogu Tekn Univ, Elekt & Elekt Muhendisligi Bolumu, Ankara, Turkey

来源：

2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2020年

关键词：

Multi-modal learning; cross-modal retrieval; theoretical analysis; Lipschitz-continuous functions;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient conditions on the properties of the embedding so that high multi-modal classification or cross-modal retrieval performance is attained. Our results show that if the Lipschitz constant of the embedding function is kept sufficiently small while increasing the between-class separation, then the probability of correct classification or retrieval approaches 1 at an exponential rate with the number of training samples.

引用

页数：4

共 50 条

[41] Learning Multi-modal Similarity
McFee, Brian
Lanckriet, Gert
JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 491 - 523
[42] Multi-modal learning for affective content analysis in movies
Yun Yi
Hanli Wang
Multimedia Tools and Applications, 2019, 78 : 13331 - 13350
[43] Multi-modal learning for affective content analysis in movies
Yi, Yun
Wang, Hanli
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (10) : 13331 - 13350
[44] Towards a Theoretical Framework for Learning Multi-modal Patterns for Embodied Agents
Noceti, Nicoletta
Caputo, Barbara
Castellini, Claudio
Baldassarre, Luca
Barla, Annalisa
Rosasco, Lorenzo
Odone, Francesca
Sandini, Giulio
IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 239 - +
[45] DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Yao, Wenfang
Yin, Kejing
Cheung, William K.
Liu, Jia
Qin, Jing
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16416 - 16424
[46] Multi-modal Alignment using Representation Codebook
Duan, Jiali
Chen, Liqun
Tran, Son
Yang, Jinyu
Xu, Yi
Zeng, Belinda
Chilimbi, Trishul
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15630 - 15639
[47] Deep multi-modal learning for joint linear representation of nonlinear dynamical systems
Qian, Shaodi
Chou, Chun-An
Li, Jr-Shin
SCIENTIFIC REPORTS, 2022, 12 (01)
[48] Multi-modal Relation Distillation for Unified 3D Representation Learning
Wang, Huiqun
Bao, Yiping
Pan, Panwang
Li, Zeming
Liu, Xiao
Yang, Ruijie
Huang, Di
COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 364 - 381
[49] Exploiting Multi-modal Fusion for Robust Face Representation Learning with Missing Modality
Zhu, Yizhe
Sun, Xin
Zhou, Xi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 283 - 294
[50] Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs
Wu, Tianyu
Tang, Yang
Sun, Qiyu
Xiong, Luolin
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 3044 - 3055

← 1 2 3 4 5 →