Multi-modal Learning Algorithms and Network Architectures for Information Extraction and Retrieval

被引：1

作者：

Bleeker, Maurits ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Multi-modal representation learning; Multi-modal neural networks; Contrastive learning; Multi-modal embeddings;

D O I：

10.1145/3503161.3548757

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Large-scale (pre-)training has recently achieved great success on both uni- and multi-modal downstream evaluation tasks. However, this training paradigm generally comes with a high cost, both in the amount of compute and data needed for training. In my Ph.D. thesis, I study the problem of multi-modal learning for information extraction and retrieval, with the main focus on new learning algorithms and network architectures to make the learning process more efficient. First, I introduce a novel network architecture for bidirectional decoding for the scene text recognition (STR) task. Next, I focus on the image-caption retrieval (ICR) task. I question if the results obtained in the metric learning field generalize to the ICR task. Finally, I focus on the reduction of shortcut learning for the ICR task. I introduce latent target decoding (LTD), a novel constraint-based learning algorithm which reduces shortcut feature learning by decoding the input caption in a semantic latent space.

引用

页码：6925 / 6929

页数：5

共 50 条

[1] Multi-Modal Attention Network Learning for Semantic Source Code Retrieval
Wan, Yao
Shu, Jingdong
Sui, Yulei
Xu, Guandong
Zhao, Zhou
Wu, Jian
Yu, Philip S.
[J]. 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 13 - 25
[2] Multi-modal information retrieval using FINT
van Zaanen, M
de Croon, G
[J]. MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 728 - +
[3] Multi-modal Information Integration for Document Retrieval
Hassan, Ehtesham
Chaudhury, Santanu
Gopal, M.
[J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1200 - 1204
[4] GMN: Generative Multi-modal Network for Practical Document Information Extraction
Cao, Haoyu
Ma, Jiefeng
Guo, Antai
Hu, Yiqing
Liu, Hao
Jiang, Deqiang
Liu, Yinsong
Ren, Bo
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3768 - 3778
[5] Multi-modal Network Representation Learning
Zhang, Chuxu
Jiang, Meng
Zhang, Xiangliang
Ye, Yanfang
Chawla, Nitesh, V
[J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558
[6] Modout: Learning Multi-modal Architectures by Stochastic Regularization
Li, Fan
Neverova, Natalia
Wolf, Christian
Taylor, Graham
[J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 422 - 429
[7] Multi-modal information retrieval with a semantic view mechanism
Li, Q
Yang, J
Zhuang, YT
[J]. 19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 133 - 138
[8] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
Wang, Yan
Zeng, Yawen
Liang, Junjie
Xing, Xiaofen
Xu, Jin
Xu, Xiangmin
[J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
[9] Panel Labels Extraction from Multi-panel Figures for Facilitating Multi-modal Information Retrieval
Ali, Mushtaq
Dong, Le
Liang, Yan
He, Ling
Feng, Ning
[J]. SEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2015), 2015, 9631
[10] Using information gain to improve multi-modal information retrieval systems
Martin-Valdivia, M. T.
Diaz-Galiano, M. C.
Montejo-Raez, A.
Urena-Lopez, L. A.
[J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (03) : 1146 - 1158

← 1 2 3 4 5 →