Pruning Pre-trained Language ModelsWithout Fine-Tuning

被引：0

作者：

Jiang, Ting ^{[1
]}

Wang, Deqing ^{[1
,3
]}

Zhuang, Fuzhen ^{[1
,2
,3
]}

Xie, Ruobing ^{[4
]}

Xia, Feng ^{[4
]}

机构：

[1] Beihang Univ, Sch Comp, SKLSDE Lab, Beijing, Peoples R China

[2] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China

[3] Zhongguancun Lab, Beijing, Peoples R China

[4] Tencent, WeChat, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To overcome the overparameterized problem in Pre-trained Language Models (PLMs), pruning is widely used as a simple and straightforward compression method by directly removing unimportant weights. Previous first-order methods successfully compress PLMs to extremely high sparsity with little performance drop. These methods, such as movement pruning, use first-order information to prune PLMs while fine-tuning the remaining weights. In this work, we argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning. Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level. In addition, we also design a new masking function and training objective to further improve SMP. Extensive experiments at various sparsity levels show SMP has significant improvements over firstorder and zero-order methods.Unlike previous first-order methods, SMP is also applicable to low sparsity and outperforms zero-order methods. Meanwhile, SMP is more parameter efficient than other methods due to it does not require fine-tuning. Our code is available at https://github.com/kongds/SMP.

引用

页码：594 / 605

页数：12

共 50 条

[1] Span Fine-tuning for Pre-trained Language Models
Bao, Rongzhou
Zhang, Zhuosheng
Zhao, Hai
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
[2] Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
Gira, Michael
Zhang, Ruisu
Lee, Kangwook
[J]. PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 59 - 69
[3] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
Chen, Hanjie
Zheng, Guoqing
Awadallah, Ahmed Hassan
Ji, Yangfeng
[J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
[4] Revisiting k-NN for Fine-Tuning Pre-trained Language Models
Li, Lei
Chen, Jing
Tian, Botzhong
Zhang, Ningyu
[J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 327 - 338
[5] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
Zhang, Haojie
Li, Ge
Li, Jia
Zhang, Zhongjin
Zhu, Yuqi
Jin, Zhi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[6] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
Liu, Xueqing
Wang, Chi
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2286 - 2300
[7] Towards Fine-tuning Pre-trained Language Models with Integer Forward and Backward Propagation
Tayaranian, Mohammadreza
Ghaffari, Alireza
Tahaei, Marzieh S.
Rezagholizadeh, Mehdi
Asgharian, Masoud
Nia, Vahid Partovi
[J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1912 - 1921
[8] Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
Alt, Christoph
Huebner, Marc
Hennig, Leonhard
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1388 - 1398
[9] Sentiment Analysis Using Pre-Trained Language Model With No Fine-Tuning and Less Resource
Kit, Yuheng
Mokji, Musa Mohd
[J]. IEEE ACCESS, 2022, 10 : 107056 - 107065
[10] Disfluencies and Fine-Tuning Pre-trained Language Models for Detection of Alzheimer's Disease
Yuan, Jiahong
Bian, Yuchen
Cai, Xingyu
Huang, Jiaji
Ye, Zheng
Church, Kenneth
[J]. INTERSPEECH 2020, 2020, : 2162 - 2166

← 1 2 3 4 5 →