Pre-trained language models: What do they know?

被引:0
|
作者
Guimaraes, Nuno [1 ,2 ]
Campos, Ricardo [1 ,3 ,4 ]
Jorge, Alipio [1 ,2 ]
机构
[1] LIAAD INESCTEC, Porto, Portugal
[2] Univ Porto, Porto, Portugal
[3] Univ Beira Interior, Covilha, Portugal
[4] Polytech Inst Tomar, Ci2 Smart Cities Res Ctr, Tomar, Portugal
关键词
large language models; natural language; pretrained language models; processing;
D O I
10.1002/widm.1518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summarization. Recently, significant attention has been drawn to OpenAI's GPT models' capabilities and extremely accessible interface. LLMs are nowadays routinely used and studied for downstream tasks and specific applications with great success, pushing forward the state of the art in almost all of them. However, they also exhibit impressive inference capabilities when used off the shelf without further training. In this paper, we aim to study the behavior of pre-trained language models (PLMs) in some inference tasks they were not initially trained for. Therefore, we focus our attention on very recent research works related to the inference capabilities of PLMs in some selected tasks such as factual probing and common-sense reasoning. We highlight relevant achievements made by these models, as well as some of their current limitations that open opportunities for further research.This article is categorized under:Fundamental Concepts of Data and Knowledge > Key Design Issues in DataMiningTechnologies > Artificial Intelligence
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Evaluating and Inducing Personality in Pre-trained Language Models
    Jiang, Guangyuan
    Xu, Manjie
    Zhu, Song-Chun
    Han, Wenjuan
    Zhang, Chi
    Zhu, Yixin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Evaluating the Summarization Comprehension of Pre-Trained Language Models
    D. I. Chernyshev
    B. V. Dobrov
    [J]. Lobachevskii Journal of Mathematics, 2023, 44 : 3028 - 3039
  • [43] Pre-trained models for natural language processing: A survey
    XiPeng Qiu
    TianXiang Sun
    YiGe Xu
    YunFan Shao
    Ning Dai
    XuanJing Huang
    [J]. Science China Technological Sciences, 2020, 63 : 1872 - 1897
  • [44] Robust Lottery Tickets for Pre-trained Language Models
    Zheng, Rui
    Bao, Rong
    Zhou, Yuhao
    Liang, Di
    Wane, Sirui
    Wu, Wei
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2211 - 2224
  • [45] Pre-Trained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [46] Leveraging pre-trained language models for code generation
    Soliman, Ahmed
    Shaheen, Samir
    Hadhoud, Mayada
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
  • [47] What Makes Pre-trained Language Models Better Zero-shot Learners?
    Lu, Jinghui
    Zhu, Dongsheng
    Han, Weidong
    Zhao, Rui
    Mac Namee, Brian
    Tan, Fei
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2288 - 2303
  • [48] Modeling Second Language Acquisition with pre-trained neural language models
    Palenzuela, Alvaro J. Jimenez
    Frasincar, Flavius
    Trusca, Maria Mihaela
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [49] μBERT: Mutation Testing using Pre-Trained Language Models
    Degiovanni, Renzo
    Papadakis, Mike
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2022), 2022, : 160 - 169
  • [50] In-Context Analogical Reasoning with Pre-Trained Language Models
    Hu, Xiaoyang
    Storks, Shane
    Lewis, Richard L.
    Chai, Joyce
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1953 - 1969