Multi-modal Learning Algorithms and Network Architectures for Information Extraction and Retrieval

被引：1

作者：

Bleeker, Maurits ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Multi-modal representation learning; Multi-modal neural networks; Contrastive learning; Multi-modal embeddings;

D O I：

10.1145/3503161.3548757

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Large-scale (pre-)training has recently achieved great success on both uni- and multi-modal downstream evaluation tasks. However, this training paradigm generally comes with a high cost, both in the amount of compute and data needed for training. In my Ph.D. thesis, I study the problem of multi-modal learning for information extraction and retrieval, with the main focus on new learning algorithms and network architectures to make the learning process more efficient. First, I introduce a novel network architecture for bidirectional decoding for the scene text recognition (STR) task. Next, I focus on the image-caption retrieval (ICR) task. I question if the results obtained in the metric learning field generalize to the ICR task. Finally, I focus on the reduction of shortcut learning for the ICR task. I introduce latent target decoding (LTD), a novel constraint-based learning algorithm which reduces shortcut feature learning by decoding the input caption in a semantic latent space.

引用

页码：6925 / 6929

页数：5

共 50 条

[31] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
Wang, Kaiye
Wang, Wei
He, Ran
Wang, Liang
Tan, Tieniu
[J]. 2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240
[32] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
Zheng, Chaoqun
Zhu, Lei
Zhang, Zheng
Duan, Wenjun
Lu, Wenpeng
[J]. INFORMATION SCIENCES, 2024, 659
[33] Optimized transfer learning based multi-modal medical image retrieval
Abid, Muhammad Haris
Ashraf, Rehan
Mahmood, Toqeer
Faisal, C. M. Nadeem
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44069 - 44100
[34] Transfer Learning for the Visual Arts: The Multi-modal Retrieval of Iconclass Codes
Banar, Nikolay
Daelemans, Walter
Kestemont, Mike
[J]. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2023, 16 (02):
[35] Optimized transfer learning based multi-modal medical image retrieval
Muhammad Haris Abid
Rehan Ashraf
Toqeer Mahmood
C. M. Nadeem Faisal
[J]. Multimedia Tools and Applications, 2024, 83 : 44069 - 44100
[36] Online Multi-Modal Distance Metric Learning with Application to Image Retrieval
Wu, Pengcheng
Hoi, Steven C. H.
Zhao, Peilin
Miao, Chunyan
Liu, Zhi-Yong
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 454 - 467
[37] Adaptive information fusion network for multi-modal personality recognition
Bao, Yongtang
Liu, Xiang
Qi, Yue
Liu, Ruijun
Li, Haojie
[J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
[38] Fusing heterogeneous information for multi-modal attributed network embedding
Yang Jieyi
Zhu Feng
Dong Yihong
Qian Jiangbo
[J]. Applied Intelligence, 2023, 53 : 22328 - 22347
[39] Fusing heterogeneous information for multi-modal attributed network embedding
Yang, Jieyi
Zhu, Feng
Dong, Yihong
Qian, Jiangbo
[J]. APPLIED INTELLIGENCE, 2023, 53 (19) : 22328 - 22347
[40] Multi-modal False Information Detection Based on Adversarial Learning
Tian, Tian
Liu, Yudong
Sun, Mengzhu
Zhang, Xi
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

← 1 2 3 4 5 →