AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation

被引:0
|
作者
Zhou, Qinhong [1 ]
Li, Peng [2 ]
Liu, Yang [2 ]
Guan, Yuyang [3 ]
Xing, Qizhou [3 ]
Chen, Ming [3 ]
Sun, Maosong [1 ]
Liu, Yang [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Tsinghua Univ, Inst AI Ind Res, Beijing, Peoples R China
[3] Beijing Sinovoice Technol Co Ltd, Beijing, Peoples R China
来源
AI OPEN | 2023年 / 4卷
关键词
Knowledge distillation; Pre-trained language model; Active learning;
D O I
10.1016/j.aiopen.2023.08.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre -trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre -training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.
引用
收藏
页码:56 / 63
页数:8
相关论文
共 50 条
  • [1] Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression
    Yang, Zhao
    Zhang, Yuanzhe
    Sui, Dianbo
    Ju, Yiming
    Zhao, Jun
    Liu, Kang
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [2] Dynamic Knowledge Distillation for Pre-trained Language Models
    Li, Lei
    Lin, Yankai
    Ren, Shuhuai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
  • [3] Knowledge Base Grounded Pre-trained Language Models via Distillation
    Sourty, Raphael
    Moreno, Jose G.
    Servant, Francois-Paul
    Tamine, Lynda
    [J]. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1617 - 1625
  • [4] Domain Knowledge Transferring for Pre-trained Language Model via Calibrated Activation Boundary Distillation
    Choi, Dongha
    Choi, HongSeok
    Lee, Hyunju
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1658 - 1669
  • [5] ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
    Zhang, Jianyi
    Muhamed, Aashiq
    Anantharaman, Aditya
    Wang, Guoyin
    Chen, Changyou
    Zhong, Kai
    Cui, Qingjun
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    Chen, Yiran
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1128 - 1136
  • [6] A Pre-trained Knowledge Tracing Model with Limited Data
    Yue, Wenli
    Su, Wei
    Liu, Lei
    Cai, Chuan
    Yuan, Yongna
    Jia, Zhongfeng
    Liu, Jiamin
    Xie, Wenjian
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 163 - 178
  • [7] Knowledge Enhanced Pre-trained Language Model for Product Summarization
    Yin, Wenbo
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Liu, Lang
    Cheng, Zhen
    Wang, Sibo
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
  • [8] PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation
    Zhu, Wei
    Zhou, Xiaofeng
    Wang, Keqiang
    Luo, Xun
    Li, Xiepeng
    Ni, Yuan
    Xie, Guotong
    [J]. SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 380 - 388
  • [9] Towards Efficient Pre-Trained Language Model via Feature Correlation Distillation
    Huang, Kun
    Guo, Xin
    Wang, Meng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Knowledge Inheritance for Pre-trained Language Models
    Qin, Yujia
    Lin, Yankai
    Yi, Jing
    Zhang, Jiajie
    Han, Xu
    Zhang, Zhengyan
    Su, Yusheng
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937