Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

被引：0

作者：

Wu, Chenwei ^{[1
]}

Lee, Holden ^{[2
]}

Ge, Rong ^{[1
]}

机构：

[1] Duke Univ, Durham, NC 27706 USA

[2] Johns Hopkins Univ, Baltimore, MD USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, researchers have found that representations learned by large-scale pre-trained language models are useful in various downstream tasks. However, there is little theoretical understanding of how pre-training performance is related to downstream task performance. In this paper, we analyze how this performance transfer depends on the properties of the downstream task and the structure of the representations. We consider a log-linear model where a word can be predicted from its context through a network having softmax as its last layer. We show that even if the downstream task is highly structured and depends on a simple function of the hidden representation, there are still cases when a low pre-training loss cannot guarantee good performance on the downstream task. On the other hand, we propose and empirically validate the existence of an "anchor vector" in the representation space, and show that this assumption, together with properties of the downstream task, guarantees performance transfer.

引用

页数：23

共 50 条

[41] A Survey of Knowledge Enhanced Pre-Trained Language Models
Hu, Linmei
Liu, Zeyi
Zhao, Ziwang
Hou, Lei
Nie, Liqiang
Li, Juanzi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430
[42] Self-conditioning Pre-Trained Language Models
Suau, Xavier
Zappella, Luca
Apostoloff, Nicholas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[43] Pre-trained models for natural language processing: A survey
QIU XiPeng
SUN TianXiang
XU YiGe
SHAO YunFan
DAI Ning
HUANG XuanJing
Science China Technological Sciences, 2020, 63 (10) : 1872 - 1897
[44] Pre-trained language models: What do they know?
Guimaraes, Nuno
Campos, Ricardo
Jorge, Alipio
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (01)
[45] Evaluating the Summarization Comprehension of Pre-Trained Language Models
Chernyshev, D. I.
Dobrov, B. V.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3028 - 3039
[46] Empowering News Recommendation with Pre-trained Language Models
Wu, Chuhan
Wu, Fangzhao
Qi, Tao
Huang, Yongfeng
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1652 - 1656
[47] Capturing Semantics for Imputation with Pre-trained Language Models
Mei, Yinan
Song, Shaoxu
Fang, Chenguang
Yang, Haifeng
Fang, Jingyun
Long, Jiang
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 61 - 72
[48] Memorisation versus Generalisation in Pre-trained Language Models
Tanzer, Michael
Ruder, Sebastian
Rei, Marek
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7564 - 7578
[49] Understanding Online Attitudes with Pre-Trained Language Models
Power, William
Obradovic, Zoran
PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 745 - 752
[50] On the Sentence Embeddings from Pre-trained Language Models
Li, Bohan
Zhou, Hao
He, Junxian
Wang, Mingxuan
Yang, Yiming
Li, Lei
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130

← 1 2 3 4 5 →