Length-Based Curriculum Learning for Efficient Pre-training of Language Models

被引:1
|
作者
Nagatsuka, Koichi [1 ]
Broni-Bediako, Clifford [1 ]
Atsumi, Masayasu [1 ]
机构
[1] Soka Univ, Grad Sch Sci & Engn, Hachioji, Tokyo 1928577, Japan
基金
日本科学技术振兴机构;
关键词
Curriculum learning (CL); Pre-trained language models (PLMs); Length-based CL; Natural language processing (NLP);
D O I
10.1007/s00354-022-00198-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, pre-trained language models (PLMs) have become core components in a wide range of natural language processing applications. However, PLMs like BERT and RoBERTa are typically trained with a large amount of unlabeled text corpora which requires extremely high computational cost. Curriculum learning (CL) is a learning strategy for training a model from easy samples to hard ones that has potential to alleviate this problem. Nevertheless, how to determine the difficulty measure of training samples for PLMs and an effective training scheduler are still open questions. In this study, we focus on the length of input text as the difficulty measure and propose a new CL approach called length-based CL. We analyze the effectiveness of the length-based difficulty measure in terms of convergence speed and GLUE scores using a limited amount of corpus. By combining maximum available batch size with the length-based difficulty measure, we show that our length-based CL model can achieve 1.5 times faster convergence speed in pre-training and better performances on downstream tasks. Furthermore, we expand the corpus to evaluate various pacing functions (training schedulers) for the length-based CL with respect to the computational time and generalization performance. Through experiments with a larger corpus, we find that our proposed Square scheduler achieved less computational time in pre-training and obtained the best generalization performance on downstream tasks.
引用
收藏
页码:109 / 134
页数:26
相关论文
共 50 条
  • [21] Towards Adversarial Attack on Vision-Language Pre-training Models
    Zhang, Jiaming
    Yi, Qi
    Sang, Jitao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
  • [22] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [23] Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
    Wang, Tzu-Jui Julius
    Laaksonen, Jorma
    Langer, Tomas
    Arponen, Heikki
    Bishop, Tom E.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1073 - 1083
  • [24] Distilling vision-language pre-training models with modality-specific meta-learning
    Ma, Xinge
    Wang, Jin
    Zhang, Xuejie
    KNOWLEDGE-BASED SYSTEMS, 2025, 315
  • [25] Survey: Transformer based video-language pre-training
    Ruan, Ludan
    Jin, Qin
    AI OPEN, 2022, 3 : 1 - 13
  • [26] Smaller Can Be Better: Efficient Data Selection for Pre-training Models
    Fang, Guang
    Wang, Shihui
    Wang, Mingxin
    Yang, Yulan
    Huang, Hao
    WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 327 - 342
  • [27] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
    Liu, Tongtong
    Feng, Fangxiang
    Wang, Xiaojie
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
  • [28] FEDBFPT: An Efficient Federated Learning Framework for BERT Further Pre-training
    Wang, Xin'ao
    Li, Huan
    Chen, Ke
    Shou, Lidan
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4344 - 4352
  • [29] SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
    Lin, Yuanze
    Wei, Chen
    Wang, Huiyu
    Yuille, Alan
    Xie, Cihang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2459 - 2469
  • [30] Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing
    Kasagi, Akihiko
    Asaoka, Masahiro
    Tabuchi, Akihiro
    Oyama, Yosuke
    Honda, Takumi
    Sakai, Yasufumi
    Dang, Thang
    Tabaru, Tsuguchika
    2021 NINTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2021), 2021, : 108 - 113