ELLE: Efficient Lifelong Pre-training for Emerging Data

被引:0
|
作者
Qin, Yujia [1 ,2 ,3 ]
Zhang, Jiajie [1 ,2 ,3 ]
Lin, Yankai [4 ]
Liu, Zhiyuan [1 ,2 ,3 ,5 ,6 ]
Li, Peng [7 ,9 ]
Sun, Maosong [1 ,2 ,3 ,5 ,6 ,8 ]
Zhou, Jie [4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[4] Tencent Inc, Vattern Recognit Ctr, WeChat AI, Shenzhen, Peoples R China
[5] Tsinghua Univ, Int Innovat Ctr, Shanghai, Peoples R China
[6] Beijing Acad Artificial Intelligence, Beijing, Peoples R China
[7] Tsinghua Univ, Inst AI Ind Res Air, Beijing, Peoples R China
[8] Jiangsu Collaborat Innovat Ctr Language Abil, Xuzhou, Jiangsu, Peoples R China
[9] Tencent, Shenzhen, Peoples R China
基金
新加坡国家研究基金会; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pretraining and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances.
引用
收藏
页码:2789 / 2810
页数:22
相关论文
共 50 条
  • [31] Too Large; Data Reduction for Vision-Language Pre-Training
    Wang, Alex Jinpeng
    Lin, Kevin Qinghong
    Zhang, David Junhao
    Lei, Stan Weixian
    Shou, Mike Zheng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3124 - 3134
  • [32] Historical document image analysis using controlled data for pre-training
    Najoua Rahal
    Lars Vögtlin
    Rolf Ingold
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 241 - 254
  • [33] Pre-training with Augmentations for Efficient Transfer in Model-Based Reinforcement Learning
    Esteves, Bernardo
    Vasco, Miguel
    Melo, Francisco S.
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I, 2023, 14115 : 133 - 145
  • [34] MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
    Shu, Fangxun
    Chen, Biaolong
    Liao, Yue
    Wang, Jinqiao
    Liu, Si
    [J]. IEEE Transactions on Multimedia, 2024, 26 : 9962 - 9972
  • [35] SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
    Lin, Yuanze
    Wei, Chen
    Wang, Huiyu
    Yuille, Alan
    Xie, Cihang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2459 - 2469
  • [36] Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing
    Kasagi, Akihiko
    Asaoka, Masahiro
    Tabuchi, Akihiro
    Oyama, Yosuke
    Honda, Takumi
    Sakai, Yasufumi
    Dang, Thang
    Tabaru, Tsuguchika
    [J]. 2021 NINTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2021), 2021, : 108 - 113
  • [37] Length-Based Curriculum Learning for Efficient Pre-training of Language Models
    Nagatsuka, Koichi
    Broni-Bediako, Clifford
    Atsumi, Masayasu
    [J]. NEW GENERATION COMPUTING, 2023, 41 (01) : 109 - 134
  • [38] Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
    Clark, Jonathan H.
    Garrette, Dan
    Turc, Iulia
    Wieting, John
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 73 - 91
  • [39] Length-Based Curriculum Learning for Efficient Pre-training of Language Models
    Koichi Nagatsuka
    Clifford Broni-Bediako
    Masayasu Atsumi
    [J]. New Generation Computing, 2023, 41 : 109 - 134
  • [40] Improving fault localization with pre-training
    Zhang, Zhuo
    Li, Ya
    Xue, Jianxin
    Mao, Xiaoguang
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)