Multi-modal Learning Algorithms and Network Architectures for Information Extraction and Retrieval

被引:1
|
作者
Bleeker, Maurits [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Multi-modal representation learning; Multi-modal neural networks; Contrastive learning; Multi-modal embeddings;
D O I
10.1145/3503161.3548757
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large-scale (pre-)training has recently achieved great success on both uni- and multi-modal downstream evaluation tasks. However, this training paradigm generally comes with a high cost, both in the amount of compute and data needed for training. In my Ph.D. thesis, I study the problem of multi-modal learning for information extraction and retrieval, with the main focus on new learning algorithms and network architectures to make the learning process more efficient. First, I introduce a novel network architecture for bidirectional decoding for the scene text recognition (STR) task. Next, I focus on the image-caption retrieval (ICR) task. I question if the results obtained in the metric learning field generalize to the ICR task. Finally, I focus on the reduction of shortcut learning for the ICR task. I introduce latent target decoding (LTD), a novel constraint-based learning algorithm which reduces shortcut feature learning by decoding the input caption in a semantic latent space.
引用
收藏
页码:6925 / 6929
页数:5
相关论文
共 50 条
  • [21] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
    Yu, Jun
    Huang, Wei
    Li, Zuhe
    Shu, Zhenqiu
    Zhu, Liang
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 130
  • [22] The integration of information in a digital, multi-modal learning environment
    Schueler, Anne
    [J]. LEARNING AND INSTRUCTION, 2019, 59 : 76 - 87
  • [23] Multi-modal Information Extraction and Fusion with Convolutional Neural Networks
    Kumar, Dinesh
    Sharma, Dharmendra
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [24] A generic neural network for multi-modal sensorimotor learning
    Carenzi, F
    Bendahan, P
    Roschin, VY
    Frolov, AA
    Gorce, P
    Maier, MA
    [J]. COMPUTATIONAL NEUROSCIENCE: TRENDS IN RESEARCH 2004, 2004, : 525 - 533
  • [25] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    [J]. NEUROCOMPUTING, 2024, 570
  • [26] A generic neural network for multi-modal sensorimotor learning
    Carenzi, F
    Bendahan, P
    Roschin, VY
    Frolov, AA
    Gorce, P
    Maier, MA
    [J]. NEUROCOMPUTING, 2004, 58 : 525 - 533
  • [27] Potential Semantics in Multi-Modal Relevance Feedback Information for Image Retrieval
    Li, Jiyi
    Ma, Qiang
    Asano, Yasuhito
    Yoshikawa, Masatoshi
    [J]. 2013 IEEE 37TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2013, : 830 - 831
  • [28] Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis
    Gu, Xiaoling
    Wong, Yongkang
    Shou, Lidan
    Peng, Pai
    Chen, Gang
    Kankanhalli, Mohan S.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1524 - 1537
  • [29] Review of Multi-Modal Retrieval in Medicine
    Ding, Guohui
    Zhang, Qi
    Fang, Shichao
    Li, Qing
    Sun, Xiaoyu
    Zhang, Luxia
    Kong, Guilan
    [J]. Computer Engineering and Applications, 2023, 59 (01) : 26 - 36
  • [30] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
    Wang, Kaiye
    Wang, Wei
    He, Ran
    Wang, Liang
    Tan, Tieniu
    [J]. 2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240