Structure-inducing pre-training

被引:0
|
作者
Matthew B. A. McDermott
Brendan Yap
Peter Szolovits
Marinka Zitnik
机构
[1] Massachusetts Institute of Technology,Computer Science and Artificial Intelligence Laboratory
[2] Harvard Medical School,Department of Biomedical Informatics
[3] Broad Institute of MIT and Harvard,undefined
[4] Harvard Data Science Initiative,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Language model pre-training and the derived general-purpose methods have reshaped machine learning research. However, there remains considerable uncertainty regarding why pre-training improves the performance of downstream tasks. This challenge is pronounced when using language model pre-training in domains outside of natural language. Here we investigate this problem by analysing how pre-training methods impose relational structure in induced per-sample latent spaces—that is, what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of samples. A comprehensive review of pre-training methods reveals that this question remains open, despite theoretical analyses showing the importance of understanding this form of induced structure. Based on this review, we introduce a pre-training framework that enables a granular and comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of the framework from the first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. Empirical studies spanning three data modalities and ten fine-tuning tasks confirm theoretical analyses, inform the design of novel pre-training methods and establish consistent improvements over a compelling suite of methods.
引用
收藏
页码:612 / 621
页数:9
相关论文
共 50 条
  • [31] Event Camera Data Pre-training
    Yang, Yan
    Pan, Liyuan
    Liu, Liu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10665 - 10675
  • [32] Pre-training Methods in Information Retrieval
    Fan, Yixing
    Xie, Xiaohui
    Cai, Yinqiong
    Chen, Jia
    Ma, Xinyu
    Li, Xiangsheng
    Zhang, Ruqing
    Guo, Jiafeng
    FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2022, 16 (03): : 178 - 317
  • [33] Pre-training in Medical Data: A Survey
    Qiu, Yixuan
    Lin, Feng
    Chen, Weitong
    Xu, Miao
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 147 - 179
  • [34] Quality Diversity for Visual Pre-Training
    Chavhan, Ruchika
    Gouk, Henry
    Li, Da
    Hospedales, Timothy
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5361 - 5371
  • [35] Ontology Pre-training for Poison Prediction
    Glauer, Martin
    Neuhaus, Fabian
    Mossakowski, Till
    Hastings, Janna
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2023, 2023, 14236 : 31 - 45
  • [36] Realistic Channel Models Pre-training
    Huangfu, Yourui
    Wang, Jian
    Xu, Chen
    Li, Rong
    Ge, Yiqun
    Wang, Xianbin
    Zhang, Huazi
    Wang, Jun
    2019 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2019,
  • [37] Automated Commit Intelligence by Pre-training
    Liu, Shangqing
    Li, Yanzhou
    Xie, Xiaofei
    Ma, Wei
    Meng, Guozhu
    Liu, Yang
    ACM Transactions on Software Engineering and Methodology, 2024, 33 (08)
  • [38] Unsupervised Pre-Training for Voice Activation
    Kolesau, Aliaksei
    Sesok, Dmitrij
    APPLIED SCIENCES-BASEL, 2020, 10 (23): : 1 - 13
  • [39] Pre-Training Without Natural Images
    Hirokatsu Kataoka
    Kazushige Okayasu
    Asato Matsumoto
    Eisuke Yamagata
    Ryosuke Yamada
    Nakamasa Inoue
    Akio Nakamura
    Yutaka Satoh
    International Journal of Computer Vision, 2022, 130 : 990 - 1007
  • [40] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133