Structure-inducing pre-training

被引:0
|
作者
Matthew B. A. McDermott
Brendan Yap
Peter Szolovits
Marinka Zitnik
机构
[1] Massachusetts Institute of Technology,Computer Science and Artificial Intelligence Laboratory
[2] Harvard Medical School,Department of Biomedical Informatics
[3] Broad Institute of MIT and Harvard,undefined
[4] Harvard Data Science Initiative,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Language model pre-training and the derived general-purpose methods have reshaped machine learning research. However, there remains considerable uncertainty regarding why pre-training improves the performance of downstream tasks. This challenge is pronounced when using language model pre-training in domains outside of natural language. Here we investigate this problem by analysing how pre-training methods impose relational structure in induced per-sample latent spaces—that is, what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of samples. A comprehensive review of pre-training methods reveals that this question remains open, despite theoretical analyses showing the importance of understanding this form of induced structure. Based on this review, we introduce a pre-training framework that enables a granular and comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of the framework from the first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. Empirical studies spanning three data modalities and ten fine-tuning tasks confirm theoretical analyses, inform the design of novel pre-training methods and establish consistent improvements over a compelling suite of methods.
引用
收藏
页码:612 / 621
页数:9
相关论文
共 50 条
  • [21] Dialogue-oriented Pre-training
    Xu, Yi
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2663 - 2673
  • [22] Pre-training Assessment Through the Web
    Kenneth Wong
    Reggie Kwan
    Jimmy SF Chan
    厦门大学学报(自然科学版), 2002, (S1) : 297 - 297
  • [23] On Masked Pre-training and the Marginal Likelihood
    Moreno-Munoz, Pablo
    Recasens, Pol G.
    Hauberg, Soren
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] Understanding tables with intermediate pre-training
    Eisenschlos, Julian Martin
    Krichene, Syrine
    Mueller, Thomas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [25] Speech Pre-training with Acoustic Piece
    Ren, Shuo
    Liu, Shujie
    Wu, Yu
    Zhou, Long
    Wei, Furu
    INTERSPEECH 2022, 2022, : 2648 - 2652
  • [26] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [27] Structural Pre-training for Dialogue Comprehension
    Zhang, Zhuosheng
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5134 - 5145
  • [28] Simulated SAR for ATR pre-training
    Willis, Christopher J.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS III, 2021, 11870
  • [29] Robot Learning with Sensorimotor Pre-training
    Radosavovic, Ilija
    Shi, Baifeng
    Fu, Letian
    Goldberg, Ken
    Darrell, Trevor
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [30] Rethinking pre-training on medical imaging
    Wen, Yang
    Chen, Leiting
    Deng, Yu
    Zhou, Chuan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78