Length-Based Curriculum Learning for Efficient Pre-training of Language Models

被引：1

作者：

Nagatsuka, Koichi ^{[1
]}

Broni-Bediako, Clifford ^{[1
]}

Atsumi, Masayasu ^{[1
]}

机构：

[1] Soka Univ, Grad Sch Sci & Engn, Hachioji, Tokyo 1928577, Japan

来源：

NEW GENERATION COMPUTING | 2023年 / 41卷 / 01期

基金：

日本科学技术振兴机构;

关键词：

Curriculum learning (CL); Pre-trained language models (PLMs); Length-based CL; Natural language processing (NLP);

D O I：

10.1007/s00354-022-00198-8

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, pre-trained language models (PLMs) have become core components in a wide range of natural language processing applications. However, PLMs like BERT and RoBERTa are typically trained with a large amount of unlabeled text corpora which requires extremely high computational cost. Curriculum learning (CL) is a learning strategy for training a model from easy samples to hard ones that has potential to alleviate this problem. Nevertheless, how to determine the difficulty measure of training samples for PLMs and an effective training scheduler are still open questions. In this study, we focus on the length of input text as the difficulty measure and propose a new CL approach called length-based CL. We analyze the effectiveness of the length-based difficulty measure in terms of convergence speed and GLUE scores using a limited amount of corpus. By combining maximum available batch size with the length-based difficulty measure, we show that our length-based CL model can achieve 1.5 times faster convergence speed in pre-training and better performances on downstream tasks. Furthermore, we expand the corpus to evaluate various pacing functions (training schedulers) for the length-based CL with respect to the computational time and generalization performance. Through experiments with a larger corpus, we find that our proposed Square scheduler achieved less computational time in pre-training and obtained the best generalization performance on downstream tasks.

引用

页码：109 / 134

页数：26

共 50 条

[1] Length-Based Curriculum Learning for Efficient Pre-training of Language Models
Koichi Nagatsuka
Clifford Broni-Bediako
Masayasu Atsumi
New Generation Computing, 2023, 41 : 109 - 134
[2] Efficient learning for spoken language understanding tasks with word embedding based pre-training
Luan, Yi
Watanabe, Shinji
Harsham, Bret
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
[3] Pre-training Language Models for Comparative Reasoning
Yu, Mengxia
Zhang, Zhihan
Yu, Wenhao
Jiang, Meng
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
[4] VILA: On Pre-training for Visual Language Models
Lin, Ji
Yin, Hongxu
Ping, Wei
Molchanov, Pavlo
Shoeybi, Mohammad
Han, Song
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
[5] INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models
Renduchintala, H. S. V. N. S. Kowndinya
Killamsetty, Krishnateja
Bhatia, Sumit
Aggarwal, Milan
Ramakrishnan, Ganesh
Iyer, Rishabh
Krishnamurthy, Balaji
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6690 - 6705
[6] Pre-training and Evaluating Transformer-based Language Models for Icelandic
Daoason, Jon Friorik
Loftsson, Hrafn
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
[7] Efficient Conditional Pre-training for Transfer Learning
Chakraborty, Shuvam
Uzkent, Burak
Ayush, Kumar
Tanmay, Kumar
Sheehan, Evan
Ermon, Stefano
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249
[8] A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding
Celikyilmaz, Asli
Sarikaya, Ruhi
Hakkani-Tur, Dilek
Liu, Xiaohu
Ramesh, Nikhil
Tur, Gokhan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3255 - 3259
[9] Improving the Sample Efficiency of Pre-training Language Models
Berend, Gabor
ERCIM NEWS, 2024, (136): : 38 - 40
[10] Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis
Perez-Almendros, Carla
Espinosa-Anke, Luis
Schockaert, Steven
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3902 - 3911

← 1 2 3 4 5 →