Length-Based Curriculum Learning for Efficient Pre-training of Language Models

被引:1
|
作者
Nagatsuka, Koichi [1 ]
Broni-Bediako, Clifford [1 ]
Atsumi, Masayasu [1 ]
机构
[1] Soka Univ, Grad Sch Sci & Engn, Hachioji, Tokyo 1928577, Japan
基金
日本科学技术振兴机构;
关键词
Curriculum learning (CL); Pre-trained language models (PLMs); Length-based CL; Natural language processing (NLP);
D O I
10.1007/s00354-022-00198-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, pre-trained language models (PLMs) have become core components in a wide range of natural language processing applications. However, PLMs like BERT and RoBERTa are typically trained with a large amount of unlabeled text corpora which requires extremely high computational cost. Curriculum learning (CL) is a learning strategy for training a model from easy samples to hard ones that has potential to alleviate this problem. Nevertheless, how to determine the difficulty measure of training samples for PLMs and an effective training scheduler are still open questions. In this study, we focus on the length of input text as the difficulty measure and propose a new CL approach called length-based CL. We analyze the effectiveness of the length-based difficulty measure in terms of convergence speed and GLUE scores using a limited amount of corpus. By combining maximum available batch size with the length-based difficulty measure, we show that our length-based CL model can achieve 1.5 times faster convergence speed in pre-training and better performances on downstream tasks. Furthermore, we expand the corpus to evaluate various pacing functions (training schedulers) for the length-based CL with respect to the computational time and generalization performance. Through experiments with a larger corpus, we find that our proposed Square scheduler achieved less computational time in pre-training and obtained the best generalization performance on downstream tasks.
引用
收藏
页码:109 / 134
页数:26
相关论文
共 50 条
  • [41] Research frontiers of pre-training mathematical models based on BERT
    Li, Guang
    Wang, Wennan
    Zhu, Liukai
    Peng, Jun
    Li, Xujia
    Luo, Ruijie
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 154 - 158
  • [42] Task-adaptive Pre-training of Language Models withWord Embedding Regularization
    Nishida, Kosuke
    Nishida, Kyosuke
    Yoshida, Sen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4546 - 4553
  • [43] Pre-Training Transformers as Energy-Based Cloze Models
    Clark, Kevin
    Luong, Minh-Thang
    Le, Quoc V.
    Manning, Christopher D.
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 285 - 294
  • [44] Exploring Sensory Knowledge and Pre-training Language Models for Chinese Metaphor Detection
    Zhao, Qingqing
    Xiang, Xue
    Wang, Zhongqing
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 120 - 126
  • [45] Robot Learning with Sensorimotor Pre-training
    Radosavovic, Ilija
    Shi, Baifeng
    Fu, Letian
    Goldberg, Ken
    Darrell, Trevor
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [46] Evaluation of pre-training large language models on leadership-class supercomputers
    Junqi Yin
    Sajal Dash
    John Gounley
    Feiyi Wang
    Georgia Tourassi
    The Journal of Supercomputing, 2023, 79 : 20747 - 20768
  • [47] Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
    Karouimu, Yasmine
    Lebret, Remi
    Foroutan, Negar
    Aberer, Karl
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 366 - 375
  • [48] Survey on Vision-language Pre-training
    Yin J.
    Zhang Z.-D.
    Gao Y.-H.
    Yang Z.-W.
    Li L.
    Xiao M.
    Sun Y.-Q.
    Yan C.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
  • [49] Sigmoid Loss for Language Image Pre-Training
    Zhai, Xiaohua
    Mustafa, Basil
    Kolesnikov, Alexander
    Beyer, Lucas
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11941 - 11952
  • [50] REPT: Bridging Language Models and Machine Reading Comprehension via Retrieval-Based Pre-training
    Jiao, Fangkai
    Guo, Yangyang
    Niu, Yilin
    Ji, Feng
    Li, Feng-Lin
    Nie, Liqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 150 - 163