How does the pre-training objective affect what large language models learn about linguistic properties?

被引:0
|
作者
Alajrami, Ahmed [1 ]
Aletras, Nikolaos [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
基金
英国科研创新办公室; 英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no previous work so far has investigated how different pre-training objectives affect what BERT learns about linguistics properties. We hypothesize that linguistically motivated objectives such as MLM should help BERT to acquire better linguistic knowledge compared to other non-linguistically motivated objectives that are not intuitive or hard for humans to guess the association between the input and the label to be predicted. To this end, we pre-train BERT with two linguistically motivated objectives and three non-linguistically motivated ones. We then probe for linguistic characteristics encoded in the representation of the resulting models. We find strong evidence that there are only small differences in probing performance between the representations learned by the two different types of objectives. These surprising results question the dominant narrative of linguistically informed pre-training.
引用
收藏
页码:131 / 147
页数:17
相关论文
共 15 条
  • [1] Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge
    Porada, Ian
    Sordoni, Alessandro
    Cheung, Jackie Chi Kit
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4550 - 4557
  • [2] Evaluation of pre-training large language models on leadership-class supercomputers
    Yin, Junqi
    Dash, Sajal
    Gounley, John
    Wang, Feiyi
    Tourassi, Georgia
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (18): : 20747 - 20768
  • [3] Affect Analysis in Arabic Text: Further Pre-Training Language Models for Sentiment and Emotion
    Alshehri, Wafa
    Al-Twairesh, Nora
    Alothaim, Abdulrahman
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [4] Evaluation of pre-training large language models on leadership-class supercomputers
    Junqi Yin
    Sajal Dash
    John Gounley
    Feiyi Wang
    Georgia Tourassi
    The Journal of Supercomputing, 2023, 79 : 20747 - 20768
  • [5] How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge?
    Sun, Simeng
    Dillon, Brian
    Iyyer, Mohit
    PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 46 - 53
  • [7] SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
    Thangarasa, Vithursan
    Gupta, Abhay
    Marshall, William
    Li, Tianda
    Leong, Kevin
    DeCoste, Dennis
    Lie, Sean
    Saxena, Shreyas
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2134 - 2146
  • [8] Pre-Wiring and Pre-Training: What Does a Neural Network Need to Learn Truly General Identity Rules?
    Alhama, Raquel G.
    Zuidema, Willem
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 927 - 946
  • [9] WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models
    Yuan, Sha
    Zhao, Hanyu
    Du, Zhengxiao
    Ding, Ming
    Liu, Xiao
    Cen, Yukuo
    Zou, Xu
    Yang, Zhilin
    Tang, Jie
    AI OPEN, 2021, 2 : 65 - 68
  • [10] A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection
    Moodaley, Wayne
    Telukdarie, Arnesh
    EUROPEAN JOURNAL OF SUSTAINABLE DEVELOPMENT, 2023, 12 (04): : 319 - 329