How does the pre-training objective affect what large language models learn about linguistic properties?

被引：0

作者：

Alajrami, Ahmed ^{[1
]}

Aletras, Nikolaos ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2 | 2022年

基金：

英国科研创新办公室; 英国工程与自然科学研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no previous work so far has investigated how different pre-training objectives affect what BERT learns about linguistics properties. We hypothesize that linguistically motivated objectives such as MLM should help BERT to acquire better linguistic knowledge compared to other non-linguistically motivated objectives that are not intuitive or hard for humans to guess the association between the input and the label to be predicted. To this end, we pre-train BERT with two linguistically motivated objectives and three non-linguistically motivated ones. We then probe for linguistic characteristics encoded in the representation of the resulting models. We find strong evidence that there are only small differences in probing performance between the representations learned by the two different types of objectives. These surprising results question the dominant narrative of linguistically informed pre-training.

引用

页码：131 / 147

页数：17

共 15 条

[1] Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge
Porada, Ian
Sordoni, Alessandro
Cheung, Jackie Chi Kit
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4550 - 4557
[2] Evaluation of pre-training large language models on leadership-class supercomputers
Yin, Junqi
Dash, Sajal
Gounley, John
Wang, Feiyi
Tourassi, Georgia
JOURNAL OF SUPERCOMPUTING, 2023, 79 (18): : 20747 - 20768
[3] Affect Analysis in Arabic Text: Further Pre-Training Language Models for Sentiment and Emotion
Alshehri, Wafa
Al-Twairesh, Nora
Alothaim, Abdulrahman
APPLIED SCIENCES-BASEL, 2023, 13 (09):
[4] Evaluation of pre-training large language models on leadership-class supercomputers
Junqi Yin
Sajal Dash
John Gounley
Feiyi Wang
Georgia Tourassi
The Journal of Supercomputing, 2023, 79 : 20747 - 20768
[5] How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge?
Sun, Simeng
Dillon, Brian
Iyyer, Mohit
PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 46 - 53
[6] Pre-Wiring and pre-training: what does a neural network need to learn truly general identity rules?
1600, AI Access Foundation (61):
[7] SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Thangarasa, Vithursan
Gupta, Abhay
Marshall, William
Li, Tianda
Leong, Kevin
DeCoste, Dennis
Lie, Sean
Saxena, Shreyas
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2134 - 2146
[8] Pre-Wiring and Pre-Training: What Does a Neural Network Need to Learn Truly General Identity Rules?
Alhama, Raquel G.
Zuidema, Willem
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 927 - 946
[9] WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models
Yuan, Sha
Zhao, Hanyu
Du, Zhengxiao
Ding, Ming
Liu, Xiao
Cen, Yukuo
Zou, Xu
Yang, Zhilin
Tang, Jie
AI OPEN, 2021, 2 : 65 - 68
[10] A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection
Moodaley, Wayne
Telukdarie, Arnesh
EUROPEAN JOURNAL OF SUSTAINABLE DEVELOPMENT, 2023, 12 (04): : 319 - 329

← 1 2 →