Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

被引:0
|
作者
Wu, Chenwei [1 ]
Lee, Holden [2 ]
Ge, Rong [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] Johns Hopkins Univ, Baltimore, MD USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, researchers have found that representations learned by large-scale pre-trained language models are useful in various downstream tasks. However, there is little theoretical understanding of how pre-training performance is related to downstream task performance. In this paper, we analyze how this performance transfer depends on the properties of the downstream task and the structure of the representations. We consider a log-linear model where a word can be predicted from its context through a network having softmax as its last layer. We show that even if the downstream task is highly structured and depends on a simple function of the hidden representation, there are still cases when a low pre-training loss cannot guarantee good performance on the downstream task. On the other hand, we propose and empirically validate the existence of an "anchor vector" in the representation space, and show that this assumption, together with properties of the downstream task, guarantees performance transfer.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding
    Wang, Deze
    Jia, Zhouyang
    Li, Shanshan
    Yu, Yue
    Xiong, Yun
    Dong, Wei
    Liao, Xiangke
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 287 - 298
  • [2] Temporal Effects on Pre-trained Models for Language Processing Tasks
    Agarwal, Oshin
    Nenkova, Ani
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 904 - 921
  • [3] Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
    Li, Belinda Z.
    Yu, Jane
    Khabsa, Madian
    Zettlemoyer, Luke
    Halevy, Alon
    Andreas, Jacob
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4696 - 4715
  • [4] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
    Yin, Ziyi
    Ye, Muchao
    Zhang, Tianrong
    Du, Tianyu
    Zhu, Jinguo
    Liu, Han
    Chen, Jinghui
    Wang, Ting
    Ma, Fenglong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [6] On the Language Neutrality of Pre-trained Multilingual Representations
    Libovicky, Jindrich
    Rosa, Rudolf
    Fraser, Alexander
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1663 - 1674
  • [7] Compression of Generative Pre-trained Language Models via Quantization
    Tao, Chaofan
    Hou, Lu
    Zhang, Wei
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Luo, Ping
    Wong, Ngai
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4821 - 4836
  • [8] Multilingual Translation via Grafting Pre-trained Language Models
    Sun, Zewei
    Wang, Mingxuan
    Li, Lei
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2735 - 2747
  • [9] Parallel Corpus Filtering via Pre-trained Language Models
    DiDi Labs
    [J]. arXiv, 2020,
  • [10] Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks
    Zhao, Haiyan
    Zhou, Tianyi
    Long, Guodong
    Jiang, Jing
    Zhang, Chengqi
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 52 - 68