Length-Based Curriculum Learning for Efficient Pre-training of Language Models

被引:1
|
作者
Nagatsuka, Koichi [1 ]
Broni-Bediako, Clifford [1 ]
Atsumi, Masayasu [1 ]
机构
[1] Soka Univ, Grad Sch Sci & Engn, Hachioji, Tokyo 1928577, Japan
基金
日本科学技术振兴机构;
关键词
Curriculum learning (CL); Pre-trained language models (PLMs); Length-based CL; Natural language processing (NLP);
D O I
10.1007/s00354-022-00198-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, pre-trained language models (PLMs) have become core components in a wide range of natural language processing applications. However, PLMs like BERT and RoBERTa are typically trained with a large amount of unlabeled text corpora which requires extremely high computational cost. Curriculum learning (CL) is a learning strategy for training a model from easy samples to hard ones that has potential to alleviate this problem. Nevertheless, how to determine the difficulty measure of training samples for PLMs and an effective training scheduler are still open questions. In this study, we focus on the length of input text as the difficulty measure and propose a new CL approach called length-based CL. We analyze the effectiveness of the length-based difficulty measure in terms of convergence speed and GLUE scores using a limited amount of corpus. By combining maximum available batch size with the length-based difficulty measure, we show that our length-based CL model can achieve 1.5 times faster convergence speed in pre-training and better performances on downstream tasks. Furthermore, we expand the corpus to evaluate various pacing functions (training schedulers) for the length-based CL with respect to the computational time and generalization performance. Through experiments with a larger corpus, we find that our proposed Square scheduler achieved less computational time in pre-training and obtained the best generalization performance on downstream tasks.
引用
收藏
页码:109 / 134
页数:26
相关论文
共 50 条
  • [1] Length-Based Curriculum Learning for Efficient Pre-training of Language Models
    Koichi Nagatsuka
    Clifford Broni-Bediako
    Masayasu Atsumi
    New Generation Computing, 2023, 41 : 109 - 134
  • [2] Efficient learning for spoken language understanding tasks with word embedding based pre-training
    Luan, Yi
    Watanabe, Shinji
    Harsham, Bret
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
  • [3] Pre-training Language Models for Comparative Reasoning
    Yu, Mengxia
    Zhang, Zhihan
    Yu, Wenhao
    Jiang, Meng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
  • [4] VILA: On Pre-training for Visual Language Models
    Lin, Ji
    Yin, Hongxu
    Ping, Wei
    Molchanov, Pavlo
    Shoeybi, Mohammad
    Han, Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
  • [5] INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models
    Renduchintala, H. S. V. N. S. Kowndinya
    Killamsetty, Krishnateja
    Bhatia, Sumit
    Aggarwal, Milan
    Ramakrishnan, Ganesh
    Iyer, Rishabh
    Krishnamurthy, Balaji
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6690 - 6705
  • [6] Pre-training and Evaluating Transformer-based Language Models for Icelandic
    Daoason, Jon Friorik
    Loftsson, Hrafn
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
  • [7] Efficient Conditional Pre-training for Transfer Learning
    Chakraborty, Shuvam
    Uzkent, Burak
    Ayush, Kumar
    Tanmay, Kumar
    Sheehan, Evan
    Ermon, Stefano
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249
  • [8] A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding
    Celikyilmaz, Asli
    Sarikaya, Ruhi
    Hakkani-Tur, Dilek
    Liu, Xiaohu
    Ramesh, Nikhil
    Tur, Gokhan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3255 - 3259
  • [9] Improving the Sample Efficiency of Pre-training Language Models
    Berend, Gabor
    ERCIM NEWS, 2024, (136): : 38 - 40
  • [10] Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis
    Perez-Almendros, Carla
    Espinosa-Anke, Luis
    Schockaert, Steven
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3902 - 3911