Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study

被引：2

作者：

Wei, Xiaokai ^{[1
]}

Gonugondla, Sujan Kumar ^{[1
]}

Wang, Shiqi ^{[1
]}

Ahmad, Wasi ^{[1
]}

Ray, Baishakhi ^{[1
]}

Qian, Haifeng ^{[1
]}

Li, Xiaopeng ^{[1
]}

Kumar, Varun ^{[1
]}

Wang, Zijian ^{[1
]}

Tian, Yuchen ^{[1
]}

Sun, Qing ^{[1
]}

Athiwaratkun, Ben ^{[1
]}

Shang, Mingyue ^{[1
]}

Ramanathan, Murali Krishna ^{[1
]}

Bhatia, Parminder ^{[1
]}

Xiang, Bing ^{[1
]}

机构：

[1] AWS AI Labs, Palo Alto, CA 94303 USA

来源：

PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023 | 2023年

关键词：

Quantization; Code Generation; Large Language Models; Generative AI; Model Hosting;

D O I：

10.1145/3611643.3616302

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

ML-powered code generation aims to assist developers to write code in a more productive manner by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have pushed the boundary of code generation and achieved impressive performance.] However, the huge number of model parameters poses a significant challenge to their adoption in a typical software development environment, where a developer might use a standard laptop or mid-size server to develop code. Such large models cost significant resources in terms of memory, latency, dollars, as well as carbon footprint. Model compression is a promising approach to address these challenges. We have identified quantization as one of the most promising compression techniques for code-generation as it avoids expensive retraining costs. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit.] We empirically evaluate quantized models on code generation tasks across different dimensions: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. Through systematic experiments we find a code-aware quantization recipe that could run even a 6-billion-parameter model in a regular laptop without significant accuracy or robustness degradation. We find that the recipe is readily applicable to code summarization task as well.

引用

页码：224 / 236

页数：13

共 21 条

[21] Towards optimizing carbapenem selection in stewardship strategies: a prospective propensity score-matched study of ertapenem versus class 2 carbapenems for empirical treatment of third-generation cephalosporin-resistant Enterobacterales bacteraemia
Vasikasin, Vasin
Panuvatvanich, Bawornnan
Rawson, Timothy M.
Holmes, Alison H.
Nasomsong, Worapong
JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY, 2023, 78 (07) : 1748 - 1756

← 1 2 3 →