AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation

被引:0
|
作者
Zhou, Qinhong [1 ]
Li, Peng [2 ]
Liu, Yang [2 ]
Guan, Yuyang [3 ]
Xing, Qizhou [3 ]
Chen, Ming [3 ]
Sun, Maosong [1 ]
Liu, Yang [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Tsinghua Univ, Inst AI Ind Res, Beijing, Peoples R China
[3] Beijing Sinovoice Technol Co Ltd, Beijing, Peoples R China
来源
AI OPEN | 2023年 / 4卷
关键词
Knowledge distillation; Pre-trained language model; Active learning;
D O I
10.1016/j.aiopen.2023.08.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre -trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre -training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.
引用
收藏
页码:56 / 63
页数:8
相关论文
共 50 条
  • [41] A Data Cartography based MixUp for Pre-trained Language Models
    Park, Seo Yeon
    Caragea, Cornelia
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
  • [42] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
    Park, Jeonghyeok
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694
  • [43] Pre-trained Language Models with Limited Data for Intent Classification
    Kasthuriarachchy, Buddhika
    Chetty, Madhu
    Karmakar, Gour
    Walls, Darren
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [44] MEM-KGC: Masked Entity Model for Knowledge Graph Completion With Pre-Trained Language Model
    Choi, Bonggeun
    Jang, Daesik
    Ko, Youngjoong
    [J]. IEEE ACCESS, 2021, 9 : 132025 - 132032
  • [45] Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey
    Bhargava, Prajjwal
    Ng, Vincent
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12317 - 12325
  • [46] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie, Qianqian
    Bishop, Jennifer Amy
    Tiwari, Prayag
    Ananiadou, Sophia
    [J]. Knowledge-Based Systems, 2022, 252
  • [47] Plug-and-Play Knowledge Injection for Pre-trained Language Models
    Zhang, Zhengyan
    Zeng, Zhiyuan
    Lin, Yankai
    Wang, Huadong
    Ye, Deming
    Xiao, Chaojun
    Han, Xu
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10641 - 10656
  • [48] HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources
    Zhang, Taolin
    Cai, Zerui
    Wang, Chengyu
    Li, Peng
    Li, Yang
    Qiu, Minghui
    Tang, Chengguang
    He, Xiaofeng
    Huang, Jun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2608 - 2617
  • [49] Commonsense Knowledge Base Completion with Relational Graph Attention Network and Pre-trained Language Model
    Ju, Jinghao
    Yang, Deqing
    Liu, Jingping
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4104 - 4108
  • [50] Knowledge-Grounded Dialogue Generation with Pre-trained Language Models
    Zhao, Xueliang
    Wu, Wei
    Xu, Can
    Tao, Chongyang
    Zhao, Dongyan
    Yan, Rui
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3377 - 3390