Compressing Pre-trained Models of Code into 3 MB

被引：6

作者：

Shi, Jieke ^{[1
]}

Yang, Zhou ^{[1
]}

Xu, Bowen ^{[1
]}

Kang, Hong Jin ^{[1
]}

Lo, David ^{[1
]}

机构：

[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore

来源：

PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

Model Compression; Genetic Algorithm; Pre-Trained Models;

D O I：

10.1145/3551349.3556964

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Although large pre-trained models of code have delivered significant advancements in various code processing tasks, there is an impediment to the wide and fluent adoption of these powerful models in software developers' daily workflow: these large models consume hundreds of megabytes of memory and run slowly on personal devices, which causes problems in model deployment and greatly degrades the user experience. It motivates us to propose Compressor, a novel approach that can compress the pre-trained models of code into extremely small models with negligible performance sacrifice. Our proposed method formulates the design of tiny models as simplifying the pre-trained model architecture: searching for a significantly smaller model that follows an architectural design similar to the original pre-trained model. Compressor proposes a genetic algorithm (GA)-based strategy to guide the simplification process. Prior studies found that a model with higher computational cost tends to be more powerful. Inspired by this insight, the GA algorithm is designed to maximize a model's Giga floating-point operations (GFLOPs), an indicator of the model computational cost, to satisfy the constraint of the target model size. Then, we use the knowledge distillation technique to train the small model: unlabelled data is fed into the large model and the outputs are used as labels to train the small model. We evaluate Compressor with two state-of-the-art pre-trained models, i.e., CodeBERT and GraphCodeBERT, on two important tasks, i.e., vulnerability prediction and clone detection. We use our method to compress pre-trained models to a size (3 MB), which is 160x smaller than the original size. The results show that compressed CodeBERT and GraphCodeBERT are 4.31x and 4.15x faster than the original model at inference, respectively. More importantly, they maintain 96.15% and 97.74% of the original performance on the vulnerability prediction task. They even maintain higher ratios (99.20% and 97.52%) of the original performance on the clone detection task.

引用

下载

页数：12

共 50 条

[41] A Systematic Survey of Chemical Pre-trained Models
Xia, Jun
Zhu, Yanqiao
Du, Yuanqi
Li, Stan Z.
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6787 - 6795
[42] Probing for Hyperbole in Pre-Trained Language Models
Schneidermann, Nina Skovgaard
Hershcovich, Daniel
Pedersen, Bolette Sandford
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 200 - 211
[43] Pre-trained language models in medicine: A survey *
Luo, Xudong
Deng, Zhiqi
Yang, Binxia
Luo, Michael Y.
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
[44] Weight Poisoning Attacks on Pre-trained Models
Kurita, Keita
Michel, Paul
Neubig, Graham
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2793 - 2806
[45] Enhancing Code Summarization with Graph Embedding and Pre-trained Model
Li, Lixuan
Li, Jie
Xu, Yihui
Zhu, Hao
Zhang, Xiaofang
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (11N12) : 1765 - 1786
[46] Towards Summarizing Code Snippets Using Pre-Trained Transformers
Mastropaolo, Antonio
Tufano, Rosalia
Ciniselli, Matteo
Aghajani, Emad
Pascarella, Luca
Bavota, Gabriele
arXiv, 1600,
[47] On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code
Weyssow, Martin
Zhou, Xin
Kim, Kisub
Lo, David
Sahraoui, Houari
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1470 - 1482
[48] Towards Summarizing Code Snippets Using Pre-Trained Transformers
Mastropaolo, Antonio
Ciniselli, Matteo
Pascarella, Luca
Tufano, Rosalia
Aghajani, Emad
Bavota, Gabriele
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 1 - 12
[49] Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond
Shi, Ensheng
Wang, Yanlin
Zhang, Hongyu
Du, Lun
Han, Shi
Zhang, Dongmei
Sun, Hongbin
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 39 - 51
[50] Pre-trained models for natural language processing: A survey
Qiu XiPeng
Sun TianXiang
Xu YiGe
Shao YunFan
Dai Ning
Huang XuanJing
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897

← 1 2 3 4 5 →