Length-Based Curriculum Learning for Efficient Pre-training of Language Models

被引：1

作者：

Nagatsuka, Koichi ^{[1
]}

Broni-Bediako, Clifford ^{[1
]}

Atsumi, Masayasu ^{[1
]}

机构：

[1] Soka Univ, Grad Sch Sci & Engn, Hachioji, Tokyo 1928577, Japan

来源：

NEW GENERATION COMPUTING | 2023年 / 41卷 / 01期

基金：

日本科学技术振兴机构;

关键词：

Curriculum learning (CL); Pre-trained language models (PLMs); Length-based CL; Natural language processing (NLP);

D O I：

10.1007/s00354-022-00198-8

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, pre-trained language models (PLMs) have become core components in a wide range of natural language processing applications. However, PLMs like BERT and RoBERTa are typically trained with a large amount of unlabeled text corpora which requires extremely high computational cost. Curriculum learning (CL) is a learning strategy for training a model from easy samples to hard ones that has potential to alleviate this problem. Nevertheless, how to determine the difficulty measure of training samples for PLMs and an effective training scheduler are still open questions. In this study, we focus on the length of input text as the difficulty measure and propose a new CL approach called length-based CL. We analyze the effectiveness of the length-based difficulty measure in terms of convergence speed and GLUE scores using a limited amount of corpus. By combining maximum available batch size with the length-based difficulty measure, we show that our length-based CL model can achieve 1.5 times faster convergence speed in pre-training and better performances on downstream tasks. Furthermore, we expand the corpus to evaluate various pacing functions (training schedulers) for the length-based CL with respect to the computational time and generalization performance. Through experiments with a larger corpus, we find that our proposed Square scheduler achieved less computational time in pre-training and obtained the best generalization performance on downstream tasks.

引用

页码：109 / 134

页数：26

共 50 条

[41] Research frontiers of pre-training mathematical models based on BERT
Li, Guang
Wang, Wennan
Zhu, Liukai
Peng, Jun
Li, Xujia
Luo, Ruijie
2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 154 - 158
[42] Task-adaptive Pre-training of Language Models withWord Embedding Regularization
Nishida, Kosuke
Nishida, Kyosuke
Yoshida, Sen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4546 - 4553
[43] Pre-Training Transformers as Energy-Based Cloze Models
Clark, Kevin
Luong, Minh-Thang
Le, Quoc V.
Manning, Christopher D.
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 285 - 294
[44] Exploring Sensory Knowledge and Pre-training Language Models for Chinese Metaphor Detection
Zhao, Qingqing
Xiang, Xue
Wang, Zhongqing
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 120 - 126
[45] Robot Learning with Sensorimotor Pre-training
Radosavovic, Ilija
Shi, Baifeng
Fu, Letian
Goldberg, Ken
Darrell, Trevor
Malik, Jitendra
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[46] Evaluation of pre-training large language models on leadership-class supercomputers
Junqi Yin
Sajal Dash
John Gounley
Feiyi Wang
Georgia Tourassi
The Journal of Supercomputing, 2023, 79 : 20747 - 20768
[47] Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Karouimu, Yasmine
Lebret, Remi
Foroutan, Negar
Aberer, Karl
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 366 - 375
[48] Survey on Vision-language Pre-training
Yin J.
Zhang Z.-D.
Gao Y.-H.
Yang Z.-W.
Li L.
Xiao M.
Sun Y.-Q.
Yan C.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
[49] Sigmoid Loss for Language Image Pre-Training
Zhai, Xiaohua
Mustafa, Basil
Kolesnikov, Alexander
Beyer, Lucas
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11941 - 11952
[50] REPT: Bridging Language Models and Machine Reading Comprehension via Retrieval-Based Pre-training
Jiao, Fangkai
Guo, Yangyang
Niu, Yilin
Ji, Feng
Li, Feng-Lin
Nie, Liqiang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 150 - 163

← 1 2 3 4 5 →