Multi-modal Learning Algorithms and Network Architectures for Information Extraction and Retrieval

被引:1
|
作者
Bleeker, Maurits [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Multi-modal representation learning; Multi-modal neural networks; Contrastive learning; Multi-modal embeddings;
D O I
10.1145/3503161.3548757
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large-scale (pre-)training has recently achieved great success on both uni- and multi-modal downstream evaluation tasks. However, this training paradigm generally comes with a high cost, both in the amount of compute and data needed for training. In my Ph.D. thesis, I study the problem of multi-modal learning for information extraction and retrieval, with the main focus on new learning algorithms and network architectures to make the learning process more efficient. First, I introduce a novel network architecture for bidirectional decoding for the scene text recognition (STR) task. Next, I focus on the image-caption retrieval (ICR) task. I question if the results obtained in the metric learning field generalize to the ICR task. Finally, I focus on the reduction of shortcut learning for the ICR task. I introduce latent target decoding (LTD), a novel constraint-based learning algorithm which reduces shortcut feature learning by decoding the input caption in a semantic latent space.
引用
收藏
页码:6925 / 6929
页数:5
相关论文
共 50 条
  • [1] Multi-Modal Attention Network Learning for Semantic Source Code Retrieval
    Wan, Yao
    Shu, Jingdong
    Sui, Yulei
    Xu, Guandong
    Zhao, Zhou
    Wu, Jian
    Yu, Philip S.
    [J]. 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 13 - 25
  • [2] Multi-modal information retrieval using FINT
    van Zaanen, M
    de Croon, G
    [J]. MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 728 - +
  • [3] Multi-modal Information Integration for Document Retrieval
    Hassan, Ehtesham
    Chaudhury, Santanu
    Gopal, M.
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1200 - 1204
  • [4] GMN: Generative Multi-modal Network for Practical Document Information Extraction
    Cao, Haoyu
    Ma, Jiefeng
    Guo, Antai
    Hu, Yiqing
    Liu, Hao
    Jiang, Deqiang
    Liu, Yinsong
    Ren, Bo
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3768 - 3778
  • [5] Multi-modal Network Representation Learning
    Zhang, Chuxu
    Jiang, Meng
    Zhang, Xiangliang
    Ye, Yanfang
    Chawla, Nitesh, V
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558
  • [6] Modout: Learning Multi-modal Architectures by Stochastic Regularization
    Li, Fan
    Neverova, Natalia
    Wolf, Christian
    Taylor, Graham
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 422 - 429
  • [7] Multi-modal information retrieval with a semantic view mechanism
    Li, Q
    Yang, J
    Zhuang, YT
    [J]. 19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 133 - 138
  • [8] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
    Wang, Yan
    Zeng, Yawen
    Liang, Junjie
    Xing, Xiaofen
    Xu, Jin
    Xu, Xiangmin
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
  • [9] Panel Labels Extraction from Multi-panel Figures for Facilitating Multi-modal Information Retrieval
    Ali, Mushtaq
    Dong, Le
    Liang, Yan
    He, Ling
    Feng, Ning
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2015), 2015, 9631
  • [10] Using information gain to improve multi-modal information retrieval systems
    Martin-Valdivia, M. T.
    Diaz-Galiano, M. C.
    Montejo-Raez, A.
    Urena-Lopez, L. A.
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (03) : 1146 - 1158