Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation

被引:1
|
作者
Chen, Cheng [1 ]
Yin, Yichun [2 ]
Shang, Lifeng [2 ]
Wang, Zhi [3 ,4 ]
Jiang, Xin [2 ]
Chen, Xiao [2 ]
Liu, Qun [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Shenzhen, Peoples R China
[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
关键词
BERT; Knowledge distillation; Structured pruning;
D O I
10.1007/978-3-030-86365-4_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression. Although achieving promising results on NLP tasks, it requires enormous computational resources. In this paper, we propose Extract Then Distill (ETD), a generic and flexible strategy to reuse the teacher's parameters for efficient and effective task-agnostic distillation, which can be applied to students of any size. Specifically, we introduce two variants of ETD, ETD-R and and ETD-Impt, which extract the teacher's parameters in a random manner and by following an importance metric, respectively. In this way, the student has already acquired some knowledge at the beginning of the distillation process, which makes the distillation process converge faster. We demonstrate the effectiveness of ETD on the GLUE benchmark and SQuAD. The experimental results show that: (1) compared with the baseline without an ETD strategy, ETD can save 70% of computation cost. Moreover, it achieves better results than the baseline when using the same computing resource. (2) ETD is generic and has been proven effective for different distillation methods (e.g., TinyBERT and MiniLM) and students of different sizes. Code is available at https://github.com/huawei-noah/Pretrained-Language-Model.
引用
收藏
页码:570 / 581
页数:12
相关论文
共 50 条
  • [41] CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration
    Yang, Qisong
    Spaan, Matthijs T. J.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10798 - 10806
  • [42] Task-Agnostic Continual Hippocampus Segmentation for Smooth Population Shifts
    Gonzalez, Camila
    Ranem, Amin
    Othman, Ahmed
    Mukhopadhyay, Anirban
    DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (DART 2022), 2022, 13542 : 108 - 118
  • [43] FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling
    Lu, Hao
    Liu, Wenze
    Fu, Hongtao
    Cao, Zhiguo
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 231 - 247
  • [44] To Distill or Not to Distill: Toward Fast, Accurate, and Communication-Efficient Federated Distillation Learning
    Zhang, Yuan
    Zhang, Wenlong
    Pu, Lingjun
    Lin, Tao
    Yan, Jinyao
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (06) : 10040 - 10053
  • [45] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
    Ye, Seonghyeon
    Hwang, Hyeonbin
    Yang, Sohee
    Yun, Hyeongu
    Kim, Yireun
    Seo, Minjoon
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19386 - 19394
  • [46] A Self-supervised Task-agnostic Embedding for EEG Signals
    Partovi, Andi
    Burkitt, Anthony N.
    Grayden, David
    2023 11TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING, NER, 2023,
  • [47] Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
    Kim, Boah
    Kim, Jeongsol
    Ye, Jong Chul
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 203 - 218
  • [48] Task-Agnostic Adaptation for Safe Human-Robot Handover
    Liu, Ruixuan
    Chen, Rui
    Liu, Changliu
    IFAC PAPERSONLINE, 2022, 55 (41): : 175 - 180
  • [49] CodePrompt: Task-Agnostic Prefix Tuning for Program and Language Generation
    Choi, YunSeok
    Lee, Jee-Hyong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5282 - 5297
  • [50] Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
    Parisi, Simone
    Dean, Victoria
    Pathak, Deepak
    Gupta, Abhinav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34