ELLE: Efficient Lifelong Pre-training for Emerging Data

被引:0
|
作者
Qin, Yujia [1 ,2 ,3 ]
Zhang, Jiajie [1 ,2 ,3 ]
Lin, Yankai [4 ]
Liu, Zhiyuan [1 ,2 ,3 ,5 ,6 ]
Li, Peng [7 ,9 ]
Sun, Maosong [1 ,2 ,3 ,5 ,6 ,8 ]
Zhou, Jie [4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[4] Tencent Inc, Vattern Recognit Ctr, WeChat AI, Shenzhen, Peoples R China
[5] Tsinghua Univ, Int Innovat Ctr, Shanghai, Peoples R China
[6] Beijing Acad Artificial Intelligence, Beijing, Peoples R China
[7] Tsinghua Univ, Inst AI Ind Res Air, Beijing, Peoples R China
[8] Jiangsu Collaborat Innovat Ctr Language Abil, Xuzhou, Jiangsu, Peoples R China
[9] Tencent, Shenzhen, Peoples R China
基金
新加坡国家研究基金会; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pretraining and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances.
引用
收藏
页码:2789 / 2810
页数:22
相关论文
共 50 条
  • [1] An Empirical Investigation of the Role of Pre-training in Lifelong Learning
    Mehta, Sanket Vaibhav
    Patil, Darshan
    Chandar, Sarath
    Strubell, Emma
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [2] Pre-training in Medical Data: A Survey
    Qiu, Yixuan
    Lin, Feng
    Chen, Weitong
    Xu, Miao
    [J]. MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 147 - 179
  • [3] Event Camera Data Pre-training
    Yang, Yan
    Pan, Liyuan
    Liu, Liu
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10665 - 10675
  • [4] Efficient Conditional Pre-training for Transfer Learning
    Chakraborty, Shuvam
    Uzkent, Burak
    Ayush, Kumar
    Tanmay, Kumar
    Sheehan, Evan
    Ermon, Stefano
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249
  • [5] SEPT: Towards Scalable and Efficient Visual Pre-training
    Lin, Yiqi
    Zheng, Huabin
    Zhong, Huaping
    Zhu, Jinjing
    Li, Weijia
    He, Conghui
    Wang, Lin
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1622 - 1630
  • [6] Application Specificity of Data for Pre-Training in Computer Vision
    Peters, Gabriel G.
    Couwenhoven, Scott D.
    Walvoord, Derek J.
    Salvaggio, Carl
    [J]. DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES VIII, 2024, 13058
  • [7] Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
    Chen, Tianlong
    Zhang, Zhenyu
    Liu, Sijia
    Zhang, Yang
    Chang, Shiyu
    Wang, Zhangyang
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] Unifying Structured Data as Graph for Data-to-Text Pre-Training
    Li, Shujie
    Li, Liang
    Geng, Ruiying
    Yang, Min
    Li, Binhua
    Yuan, Guanghu
    He, Wanwei
    Yuan, Shao
    Ma, Can
    Huang, Fei
    Li, Yongbin
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 210 - 228
  • [9] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
    Liu, Tongtong
    Feng, Fangxiang
    Wang, Xiaojie
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
  • [10] PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
    Liu, Hongbin
    Jia, Jinyuan
    Gong, Neil Zhenqiang
    [J]. PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 3629 - 3645