On the Multilingual Capabilities of Very Large-Scale English Language Models

被引:0
|
作者
Armengol-Estape, Jordi [1 ]
de Gibert Bonet, Ona [1 ]
Melero, Maite [1 ]
机构
[1] Barcelona Supercomp Ctr, Placa Eusebi Guell 1-3, Barcelona 08034, Spain
关键词
Multilingual; Cross-lingual; Language Modeling;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning. These language models have been shown to exhibit outstanding zero, one, and few-shot learning capabilities in a number of different tasks. Nevertheless, aside from anecdotal experiences, little is known regarding their multilingual capabilities, given the fact that the pre-training corpus is almost entirely composed of English text. In this work, we investigate its potential and limits in three tasks: extractive Question-Answering, text summarization and natural language generation for five different languages, as well as the effect of scale in terms of model size. Our results show that GPT-3 can be used, not only as a powerful generative pre-trained model for English, but for other languages as well, even for some with very few data in the training corpora, with room for improvement if optimization of the tokenization is addressed.
引用
收藏
页码:3056 / 3068
页数:13
相关论文
共 50 条
  • [1] MassiveSumm: a very large-scale, very multilingual, newswire summarisation dataset
    Varab, Daniel
    Schluter, Natalie
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10150 - 10161
  • [2] KNOW: Developing large-scale multilingual technologies for language understanding
    Agirre, Eneko
    Castellon, Irene
    Padro, Lluis
    Climent, Salvador
    Rigau, German
    Alonso, Laura
    Cuadros, Montse
    Coll-Florit, Marta
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (43): : 377 - 378
  • [3] Hancock: A language for processing very large-scale data
    Bonachea, D
    Fisher, K
    Rogers, A
    Smith, F
    [J]. USENIX ASSOCIATION PROCEEDINGS OF THE 2ND CONFERENCE ON DOMAIN-SPECIFIC LANGUAGES (DSL'99), 1999, : 163 - 176
  • [4] Hancock: A language for processing very large-scale data
    Bonachea, D
    Fisher, K
    Rogers, A
    Smith, F
    [J]. ACM SIGPLAN NOTICES, 2000, 35 (01) : 163 - 176
  • [5] Large-Scale Assessment and English Language Learners With Disabilities
    Liu, Kristin K.
    Ward, Jenna M.
    Thurlow, Martha L.
    Christensen, Laurene L.
    [J]. EDUCATIONAL POLICY, 2017, 31 (05) : 551 - 583
  • [6] SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR
    Li, Bo
    Pang, Ruoming
    Sainath, Tara N.
    Gulati, Anmol
    Zhang, Yu
    Qin, James
    Haghani, Parisa
    Huang, W. Ronny
    Ma, Min
    Bai, Junwen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1011 - 1018
  • [7] A Large-Scale Multilingual Disambiguation of Glosses
    Camacho-Collados, Jose
    Bovi, Claudio Delli
    Raganato, Alessandro
    Navigli, Roberto
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1701 - 1708
  • [8] Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise
    Cheng, Charibeth
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
  • [9] A LARGE-SCALE STUDY OF LANGUAGE MODELS FOR CHORD PREDICTION
    Korzeniowski, Filip
    Sears, David R. W.
    Widmer, Gerhard
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 91 - 95
  • [10] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
    Zhao, Zirui
    Lee, Wee Sun
    Hsu, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,