Pruning Pre-trained Language ModelsWithout Fine-Tuning

被引:0
|
作者
Jiang, Ting [1 ]
Wang, Deqing [1 ,3 ]
Zhuang, Fuzhen [1 ,2 ,3 ]
Xie, Ruobing [4 ]
Xia, Feng [4 ]
机构
[1] Beihang Univ, Sch Comp, SKLSDE Lab, Beijing, Peoples R China
[2] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[3] Zhongguancun Lab, Beijing, Peoples R China
[4] Tencent, WeChat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To overcome the overparameterized problem in Pre-trained Language Models (PLMs), pruning is widely used as a simple and straightforward compression method by directly removing unimportant weights. Previous first-order methods successfully compress PLMs to extremely high sparsity with little performance drop. These methods, such as movement pruning, use first-order information to prune PLMs while fine-tuning the remaining weights. In this work, we argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning. Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level. In addition, we also design a new masking function and training objective to further improve SMP. Extensive experiments at various sparsity levels show SMP has significant improvements over firstorder and zero-order methods.Unlike previous first-order methods, SMP is also applicable to low sparsity and outperforms zero-order methods. Meanwhile, SMP is more parameter efficient than other methods due to it does not require fine-tuning. Our code is available at https://github.com/kongds/SMP.
引用
收藏
页码:594 / 605
页数:12
相关论文
共 50 条
  • [1] Span Fine-tuning for Pre-trained Language Models
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
  • [2] Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
    Gira, Michael
    Zhang, Ruisu
    Lee, Kangwook
    [J]. PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 59 - 69
  • [3] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
    Chen, Hanjie
    Zheng, Guoqing
    Awadallah, Ahmed Hassan
    Ji, Yangfeng
    [J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
  • [4] Revisiting k-NN for Fine-Tuning Pre-trained Language Models
    Li, Lei
    Chen, Jing
    Tian, Botzhong
    Zhang, Ningyu
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 327 - 338
  • [5] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
    Zhang, Haojie
    Li, Ge
    Li, Jia
    Zhang, Zhongjin
    Zhu, Yuqi
    Jin, Zhi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
    Liu, Xueqing
    Wang, Chi
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2286 - 2300
  • [7] Towards Fine-tuning Pre-trained Language Models with Integer Forward and Backward Propagation
    Tayaranian, Mohammadreza
    Ghaffari, Alireza
    Tahaei, Marzieh S.
    Rezagholizadeh, Mehdi
    Asgharian, Masoud
    Nia, Vahid Partovi
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1912 - 1921
  • [8] Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
    Alt, Christoph
    Huebner, Marc
    Hennig, Leonhard
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1388 - 1398
  • [9] Sentiment Analysis Using Pre-Trained Language Model With No Fine-Tuning and Less Resource
    Kit, Yuheng
    Mokji, Musa Mohd
    [J]. IEEE ACCESS, 2022, 10 : 107056 - 107065
  • [10] Disfluencies and Fine-Tuning Pre-trained Language Models for Detection of Alzheimer's Disease
    Yuan, Jiahong
    Bian, Yuchen
    Cai, Xingyu
    Huang, Jiaji
    Ye, Zheng
    Church, Kenneth
    [J]. INTERSPEECH 2020, 2020, : 2162 - 2166