Gradient-based Intra-attention Pruning on Pre-trained Language Models

被引：0

作者：

Yang, Ziqing ^{[1
]}

Cui, Yiming ^{[1
,2
]}

Yao, Xin ^{[1
]}

Wang, Shijin ^{[1
,3
]}

机构：

[1] IFLYTEK Res, State Key Lab Cognit Intelligence, Beijing, Peoples R China

[2] Harbin Inst Technol, Res Ctr SCIR, Harbin, Peoples R China

[3] IFLYTEK AI Res Cent China, Wuhan, Peoples R China

来源：

PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models achieve superior performance but are computationally expensive. Techniques such as pruning and knowledge distillation have been developed to reduce their sizes and latencies. In this work, we propose a structured pruning method GRAIN (Gradientbased Intra-attention pruning), which performs task-specific pruning with knowledge distillation and yields highly effective models. Different from common approaches that prune each attention head as a whole, GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models. We also propose a gradient separation strategy that reduces the interference of distillation on pruning for a better combination of the two approaches. Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime, and achieves 6 similar to 7x speedups while maintaining 93% similar to 99% performance. Under extreme compression where only 3% transformer weights remain, the pruned model is still competitive compared to larger models.

引用

页码：2775 / 2790

页数：16

共 50 条

[1] Structured Pruning for Efficient Generative Pre-trained Language Models
Tao, Chaofan
Hou, Lu
Bai, Haoli
Wei, Jiansheng
Jiang, Xin
Liu, Qun
Lu, Ping
Wong, Ngai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10880 - 10895
[2] TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Yang, Ziqing
Cui, Yiming
Chen, Zhigang
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2022, : 35 - 43
[3] Pruning Pre-trained Language Models with Principled Importance and Self-regularization
Ren, Siyu
Zhu, Kenny Q.
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8995 - 9008
[4] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
ENGINEERING, 2023, 25 : 51 - 65
[5] Pruning Pre-trained Language ModelsWithout Fine-Tuning
Jiang, Ting
Wang, Deqing
Zhuang, Fuzhen
Xie, Ruobing
Xia, Feng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 594 - 605
[6] APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models
Wang, Qifan
Mao, Yuning
Wang, Jingang
Yu, Hanchao
Li, Shaoliang
Wang, Sinong
Feng, Fuli
Huang, Lifu
Quan, Xiaojun
Xu, Zenglin
Liu, Dongfang
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9147 - 9160
[7] A Data Cartography based MixUp for Pre-trained Language Models
Park, Seo Yeon
Caragea, Cornelia
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4244 - 4250
[8] Pre-trained transformer-based language models for Sundanese
Wilson Wongso
Henry Lucky
Derwin Suhartono
Journal of Big Data, 9
[9] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
JOURNAL OF BIG DATA, 2022, 9 (01)
[10] Annotating Columns with Pre-trained Language Models
Suhara, Yoshihiko
Li, Jinfeng
Li, Yuliang
Zhang, Dan
Demiralp, Cagatay
Chen, Chen
Tan, Wang-Chiew
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503

← 1 2 3 4 5 →