Multi-modal Learning Algorithms and Network Architectures for Information Extraction and Retrieval

被引:1
|
作者
Bleeker, Maurits [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Multi-modal representation learning; Multi-modal neural networks; Contrastive learning; Multi-modal embeddings;
D O I
10.1145/3503161.3548757
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large-scale (pre-)training has recently achieved great success on both uni- and multi-modal downstream evaluation tasks. However, this training paradigm generally comes with a high cost, both in the amount of compute and data needed for training. In my Ph.D. thesis, I study the problem of multi-modal learning for information extraction and retrieval, with the main focus on new learning algorithms and network architectures to make the learning process more efficient. First, I introduce a novel network architecture for bidirectional decoding for the scene text recognition (STR) task. Next, I focus on the image-caption retrieval (ICR) task. I question if the results obtained in the metric learning field generalize to the ICR task. Finally, I focus on the reduction of shortcut learning for the ICR task. I introduce latent target decoding (LTD), a novel constraint-based learning algorithm which reduces shortcut feature learning by decoding the input caption in a semantic latent space.
引用
收藏
页码:6925 / 6929
页数:5
相关论文
共 50 条
  • [31] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
    Wang, Kaiye
    Wang, Wei
    He, Ran
    Wang, Liang
    Tan, Tieniu
    [J]. 2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240
  • [32] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
    Zheng, Chaoqun
    Zhu, Lei
    Zhang, Zheng
    Duan, Wenjun
    Lu, Wenpeng
    [J]. INFORMATION SCIENCES, 2024, 659
  • [33] Optimized transfer learning based multi-modal medical image retrieval
    Abid, Muhammad Haris
    Ashraf, Rehan
    Mahmood, Toqeer
    Faisal, C. M. Nadeem
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44069 - 44100
  • [34] Transfer Learning for the Visual Arts: The Multi-modal Retrieval of Iconclass Codes
    Banar, Nikolay
    Daelemans, Walter
    Kestemont, Mike
    [J]. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2023, 16 (02):
  • [35] Optimized transfer learning based multi-modal medical image retrieval
    Muhammad Haris Abid
    Rehan Ashraf
    Toqeer Mahmood
    C. M. Nadeem Faisal
    [J]. Multimedia Tools and Applications, 2024, 83 : 44069 - 44100
  • [36] Online Multi-Modal Distance Metric Learning with Application to Image Retrieval
    Wu, Pengcheng
    Hoi, Steven C. H.
    Zhao, Peilin
    Miao, Chunyan
    Liu, Zhi-Yong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 454 - 467
  • [37] Adaptive information fusion network for multi-modal personality recognition
    Bao, Yongtang
    Liu, Xiang
    Qi, Yue
    Liu, Ruijun
    Li, Haojie
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
  • [38] Fusing heterogeneous information for multi-modal attributed network embedding
    Yang Jieyi
    Zhu Feng
    Dong Yihong
    Qian Jiangbo
    [J]. Applied Intelligence, 2023, 53 : 22328 - 22347
  • [39] Fusing heterogeneous information for multi-modal attributed network embedding
    Yang, Jieyi
    Zhu, Feng
    Dong, Yihong
    Qian, Jiangbo
    [J]. APPLIED INTELLIGENCE, 2023, 53 (19) : 22328 - 22347
  • [40] Multi-modal False Information Detection Based on Adversarial Learning
    Tian, Tian
    Liu, Yudong
    Sun, Mengzhu
    Zhang, Xi
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,