Gradient-based Intra-attention Pruning on Pre-trained Language Models

被引:0
|
作者
Yang, Ziqing [1 ]
Cui, Yiming [1 ,2 ]
Yao, Xin [1 ]
Wang, Shijin [1 ,3 ]
机构
[1] IFLYTEK Res, State Key Lab Cognit Intelligence, Beijing, Peoples R China
[2] Harbin Inst Technol, Res Ctr SCIR, Harbin, Peoples R China
[3] IFLYTEK AI Res Cent China, Wuhan, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models achieve superior performance but are computationally expensive. Techniques such as pruning and knowledge distillation have been developed to reduce their sizes and latencies. In this work, we propose a structured pruning method GRAIN (Gradientbased Intra-attention pruning), which performs task-specific pruning with knowledge distillation and yields highly effective models. Different from common approaches that prune each attention head as a whole, GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models. We also propose a gradient separation strategy that reduces the interference of distillation on pruning for a better combination of the two approaches. Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime, and achieves 6 similar to 7x speedups while maintaining 93% similar to 99% performance. Under extreme compression where only 3% transformer weights remain, the pruned model is still competitive compared to larger models.
引用
收藏
页码:2775 / 2790
页数:16
相关论文
共 50 条
  • [1] Structured Pruning for Efficient Generative Pre-trained Language Models
    Tao, Chaofan
    Hou, Lu
    Bai, Haoli
    Wei, Jiansheng
    Jiang, Xin
    Liu, Qun
    Lu, Ping
    Wong, Ngai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10880 - 10895
  • [2] TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
    Yang, Ziqing
    Cui, Yiming
    Chen, Zhigang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2022, : 35 - 43
  • [3] Pruning Pre-trained Language Models with Principled Importance and Self-regularization
    Ren, Siyu
    Zhu, Kenny Q.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8995 - 9008
  • [4] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [5] Pruning Pre-trained Language ModelsWithout Fine-Tuning
    Jiang, Ting
    Wang, Deqing
    Zhuang, Fuzhen
    Xie, Ruobing
    Xia, Feng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 594 - 605
  • [6] APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models
    Wang, Qifan
    Mao, Yuning
    Wang, Jingang
    Yu, Hanchao
    Li, Shaoliang
    Wang, Sinong
    Feng, Fuli
    Huang, Lifu
    Quan, Xiaojun
    Xu, Zenglin
    Liu, Dongfang
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9147 - 9160
  • [7] A Data Cartography based MixUp for Pre-trained Language Models
    Park, Seo Yeon
    Caragea, Cornelia
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
  • [8] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [9] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [10] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503