Multi-modal Learning Algorithms and Network Architectures for Information Extraction and Retrieval

被引：1

作者：

Bleeker, Maurits ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Multi-modal representation learning; Multi-modal neural networks; Contrastive learning; Multi-modal embeddings;

D O I：

10.1145/3503161.3548757

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Large-scale (pre-)training has recently achieved great success on both uni- and multi-modal downstream evaluation tasks. However, this training paradigm generally comes with a high cost, both in the amount of compute and data needed for training. In my Ph.D. thesis, I study the problem of multi-modal learning for information extraction and retrieval, with the main focus on new learning algorithms and network architectures to make the learning process more efficient. First, I introduce a novel network architecture for bidirectional decoding for the scene text recognition (STR) task. Next, I focus on the image-caption retrieval (ICR) task. I question if the results obtained in the metric learning field generalize to the ICR task. Finally, I focus on the reduction of shortcut learning for the ICR task. I introduce latent target decoding (LTD), a novel constraint-based learning algorithm which reduces shortcut feature learning by decoding the input caption in a semantic latent space.

引用

页码：6925 / 6929

页数：5

共 50 条

[21] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
Yu, Jun
Huang, Wei
Li, Zuhe
Shu, Zhenqiu
Zhu, Liang
[J]. DIGITAL SIGNAL PROCESSING, 2022, 130
[22] The integration of information in a digital, multi-modal learning environment
Schueler, Anne
[J]. LEARNING AND INSTRUCTION, 2019, 59 : 76 - 87
[23] Multi-modal Information Extraction and Fusion with Convolutional Neural Networks
Kumar, Dinesh
Sharma, Dharmendra
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[24] A generic neural network for multi-modal sensorimotor learning
Carenzi, F
Bendahan, P
Roschin, VY
Frolov, AA
Gorce, P
Maier, MA
[J]. COMPUTATIONAL NEUROSCIENCE: TRENDS IN RESEARCH 2004, 2004, : 525 - 533
[25] Multi-modal anchor adaptation learning for multi-modal summarization
Chen, Zhongfeng
Lu, Zhenyu
Rong, Huan
Zhao, Chuanjun
Xu, Fan
[J]. NEUROCOMPUTING, 2024, 570
[26] A generic neural network for multi-modal sensorimotor learning
Carenzi, F
Bendahan, P
Roschin, VY
Frolov, AA
Gorce, P
Maier, MA
[J]. NEUROCOMPUTING, 2004, 58 : 525 - 533
[27] Potential Semantics in Multi-Modal Relevance Feedback Information for Image Retrieval
Li, Jiyi
Ma, Qiang
Asano, Yasuhito
Yoshikawa, Masatoshi
[J]. 2013 IEEE 37TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2013, : 830 - 831
[28] Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis
Gu, Xiaoling
Wong, Yongkang
Shou, Lidan
Peng, Pai
Chen, Gang
Kankanhalli, Mohan S.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1524 - 1537
[29] Review of Multi-Modal Retrieval in Medicine
Ding, Guohui
Zhang, Qi
Fang, Shichao
Li, Qing
Sun, Xiaoyu
Zhang, Luxia
Kong, Guilan
[J]. Computer Engineering and Applications, 2023, 59 (01) : 26 - 36
[30] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
Wang, Kaiye
Wang, Wei
He, Ran
Wang, Liang
Tan, Tieniu
[J]. 2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240

← 1 2 3 4 5 →