Length-Based Curriculum Learning for Efficient Pre-training of Language Models

被引：1

作者：

Nagatsuka, Koichi ^{[1
]}

Broni-Bediako, Clifford ^{[1
]}

Atsumi, Masayasu ^{[1
]}

机构：

[1] Soka Univ, Grad Sch Sci & Engn, Hachioji, Tokyo 1928577, Japan

来源：

NEW GENERATION COMPUTING | 2023年 / 41卷 / 01期

基金：

日本科学技术振兴机构;

关键词：

Curriculum learning (CL); Pre-trained language models (PLMs); Length-based CL; Natural language processing (NLP);

D O I：

10.1007/s00354-022-00198-8

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, pre-trained language models (PLMs) have become core components in a wide range of natural language processing applications. However, PLMs like BERT and RoBERTa are typically trained with a large amount of unlabeled text corpora which requires extremely high computational cost. Curriculum learning (CL) is a learning strategy for training a model from easy samples to hard ones that has potential to alleviate this problem. Nevertheless, how to determine the difficulty measure of training samples for PLMs and an effective training scheduler are still open questions. In this study, we focus on the length of input text as the difficulty measure and propose a new CL approach called length-based CL. We analyze the effectiveness of the length-based difficulty measure in terms of convergence speed and GLUE scores using a limited amount of corpus. By combining maximum available batch size with the length-based difficulty measure, we show that our length-based CL model can achieve 1.5 times faster convergence speed in pre-training and better performances on downstream tasks. Furthermore, we expand the corpus to evaluate various pacing functions (training schedulers) for the length-based CL with respect to the computational time and generalization performance. Through experiments with a larger corpus, we find that our proposed Square scheduler achieved less computational time in pre-training and obtained the best generalization performance on downstream tasks.

引用

页码：109 / 134

页数：26

共 50 条

[21] Towards Adversarial Attack on Vision-Language Pre-training Models
Zhang, Jiaming
Yi, Qi
Sang, Jitao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
[22] Pre-training Universal Language Representation
Li, Yian
Zhao, Hai
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
[23] Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
Wang, Tzu-Jui Julius
Laaksonen, Jorma
Langer, Tomas
Arponen, Heikki
Bishop, Tom E.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1073 - 1083
[24] Distilling vision-language pre-training models with modality-specific meta-learning
Ma, Xinge
Wang, Jin
Zhang, Xuejie
KNOWLEDGE-BASED SYSTEMS, 2025, 315
[25] Survey: Transformer based video-language pre-training
Ruan, Ludan
Jin, Qin
AI OPEN, 2022, 3 : 1 - 13
[26] Smaller Can Be Better: Efficient Data Selection for Pre-training Models
Fang, Guang
Wang, Shihui
Wang, Mingxin
Yang, Yulan
Huang, Hao
WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 327 - 342
[27] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Liu, Tongtong
Feng, Fangxiang
Wang, Xiaojie
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
[28] FEDBFPT: An Efficient Federated Learning Framework for BERT Further Pre-training
Wang, Xin'ao
Li, Huan
Chen, Ke
Shou, Lidan
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4344 - 4352
[29] SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Lin, Yuanze
Wei, Chen
Wang, Huiyu
Yuille, Alan
Xie, Cihang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2459 - 2469
[30] Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing
Kasagi, Akihiko
Asaoka, Masahiro
Tabuchi, Akihiro
Oyama, Yosuke
Honda, Takumi
Sakai, Yasufumi
Dang, Thang
Tabaru, Tsuguchika
2021 NINTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2021), 2021, : 108 - 113

← 1 2 3 4 5 →