Language Models are Few-Shot Learners

被引:0
|
作者
Brown, Tom B.
Mann, Benjamin
Ryder, Nick
Subbiah, Melanie
Kaplan, Jared [1 ,2 ]
Dhariwal, Prafulla
Neelakantan, Arvind
Shyam, Pranav
Sastry, Girish
Askell, Amanda
Agarwal, Sandhini
Herbert-Voss, Ariel
Krueger, Gretchen
Henighan, Tom
Child, Rewon
Ramesh, Aditya
Ziegler, Daniel M.
Wu, Jeffrey
Winter, Clemens
Hesse, Christopher
Chen, Mark
Sigler, Eric
Litwin, Mateusz
Gray, Scott
Chess, Benjamin
Clark, Jack
Berner, Christopher
McCandlish, Sam
Radford, Alec
Sutskever, Ilya
Amodei, Dario
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] OpenAI, San Francisco, CA 94110 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Adapting Language-Audio Models as Few-Shot Audio Learners
    Liang, Jinhua
    Liu, Xubo
    Liu, Haohe
    Phan, Huy
    Benetos, Emmanouil
    Plumbley, Mark D.
    Wang, Wenwu
    INTERSPEECH 2023, 2023, : 276 - 280
  • [2] Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
    Wang, Zhenhailong
    Li, Manling
    Xu, Ruochen
    Zhou, Luowei
    Lei, Jie
    Lin, Xudong
    Wang, Shuohang
    Yang, Ziyi
    Zhu, Chenguang
    Hoiem, Derek
    Chang, Shih-Fu
    Bansal, Mohit
    Ji, Heng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Language Models are Few-Shot Butlers
    Micheli, Vincent
    Fleuret, Francois
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9312 - 9318
  • [4] Making Pre-trained Language Models Better Few-shot Learners
    Gao, Tianyu
    Fisch, Adam
    Chen, Danqi
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3816 - 3830
  • [5] It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
    Schick, Timo
    Schuetze, Hinrich
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2339 - 2352
  • [6] Few-shot Subgoal Planning with Language Models
    Logeswaran, Lajanugen
    Fu, Yao
    Lee, Moontae
    Lee, Honglak
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5493 - 5506
  • [7] True Few-Shot Learning with Language Models
    Perez, Ethan
    Kiela, Douwe
    Cho, Kyunghyun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [8] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Are LSTMs good few-shot learners?
    Huisman, Mike
    Moerland, Thomas M.
    Plaat, Aske
    van Rijn, Jan N.
    MACHINE LEARNING, 2023, 112 (11) : 4635 - 4662
  • [10] Are LSTMs good few-shot learners?
    Mike Huisman
    Thomas M. Moerland
    Aske Plaat
    Jan N. van Rijn
    Machine Learning, 2023, 112 : 4635 - 4662