Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs

被引：0

作者：

Wang, Dingmin ^{[1
]}

Zhao, Jinman ^{[2
]}

Pei, Hengzhi ^{[2
]}

Tana, Samson ^{[3
]}

Zha, Sheng ^{[3
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Amazon Web Serv, Seattle, WA USA

[3] Amazon AGI, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Handling drafty partial code remains a notable challenge in real-time code suggestion applications. Previous work has demonstrated shortcomings of large language models of code (CodeLLMs) in completing partial code with potential bugs. In this study, we view partial code as implementation hints and finetune CodeLLMs to jointly rewrite and complete partial code into functional full programs. We explore two strategies: one-pass generation and multi-pass iterative refinement. We construct new training and testing datasets using semantic-altering code transformations and iterative self-generations. We conduct comprehensive experiments over three representative open-sourced CodeLLMs - InCoder, CodeGen, and StarCoder. Results show that CodeLLMs fine-tuned using our approach achieve superior pass rates compared to the previous baselines across existing and newly-created benchmarks, effectively handle both potentially buggy and clean code, and largely preserve the integrity of the original partial implementations. We further present findings on the properties of the potential bugs we tested and on the design choices of our methods.

引用

页码：15854 / 15868

页数：15

共 50 条

[31] Fine-Tuning Large Enterprise Language Models via Ontological Reasoning
Baldazzi, Teodoro
Bellomarini, Luigi
Ceri, Stefano
Colombo, Andrea
Gentili, Andrea
Sallinger, Emanuel
RULES AND REASONING, RULEML+RR 2023, 2023, 14244 : 86 - 94
[32] Fine-Tuning Pre-Trained Language Models with Gaze Supervision
Deng, Shuwen
Prasse, Paul
Reich, David R.
Scheffer, Tobias
Jager, Lena A.
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 217 - 224
[33] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[34] Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA
Alahmari, Saeed S.
Hall, Lawrence O.
Mouton, Peter R.
Goldgof, Dmitry B.
IEEE ACCESS, 2024, 12 : 153221 - 153231
[35] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Vogt-Lowell, Kevin
Lee, Noah
Tsiligkaridis, Theodoros
Vaillant, Marc
2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
[36] Sentence embedding and fine-tuning to automatically identify duplicate bugs
Isotani, Haruna
Washizaki, Hironori
Fukazawa, Yoshiaki
Nomoto, Tsutomu
Ouji, Saori
Saito, Shinobu
FRONTIERS IN COMPUTER SCIENCE, 2023, 4
[37] Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning
Franzoni, Valentina
Tagliente, Silvia
Milani, Alfredo
TECHNOLOGIES, 2024, 12 (11)
[38] Multi-phase Fine-Tuning: A New Fine-Tuning Approach for Sign Language Recognition
Sarhan, Noha
Lauri, Mikko
Frintrop, Simone
KUNSTLICHE INTELLIGENZ, 2022, 36 (01): : 91 - 98
[39] Multi-phase Fine-Tuning: A New Fine-Tuning Approach for Sign Language Recognition
Noha Sarhan
Mikko Lauri
Simone Frintrop
KI - Künstliche Intelligenz, 2022, 36 : 91 - 98
[40] Fine-tuning constraints on supergravity models
Bastero-Gil, M
Kane, GL
King, SF
PHYSICS LETTERS B, 2000, 474 (1-2) : 103 - 112

← 1 2 3 4 5 →