Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

被引：1

作者：

Tang, Ze ^{[1
]}

Ge, Jidong ^{[1
]}

Liu, Shangqing ^{[2
]}

Zhu, Tingwei ^{[1
]}

Xu, Tongtong ^{[3
]}

Huang, Liguo ^{[4
]}

Luo, Bin ^{[1
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Nanyang Technol Univ, Singapore, Singapore

[3] Huawei Software Engn Applicat Technol, Shenzhen, Peoples R China

[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA

来源：

2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE | 2023年

关键词：

domain adaptive code completion; retrieval-augment language model;

D O I：

10.1109/ASE56229.2023.00076

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.

引用

页码：421 / 433

页数：13

共 50 条

[21] Malicious Domain Detection via Domain Relationship and Graph Models
He, Wenxuan
Gou, Gaopeng
Kang, Cuicui
Liu, Chang
Li, Zhen
Xiong, Gang
2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2019,
[22] Structure-Decoupled Adaptive Part Alignment Network for Domain Adaptive Mitochondria Segmentation
Sun, Rui
Mai, Huayu
Luo, Naisong
Zhang, Tianzhu
Xiong, Zhiwei
Wu, Feng
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 523 - 533
[23] Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications
Huang, Dingyun
Cole, Jacqueline M.
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025, 65 (05) : 2476 - 2486
[24] Joint Spoken Language Understanding and Domain Adaptive Language Modeling
Zhang, Huifeng
Zhu, Su
Fan, Shuai
Yu, Kai
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 311 - 324
[25] Using out-of-domain data to improve on-domain language models
Iyer, R
Ostendorf, M
Gish, H
IEEE SIGNAL PROCESSING LETTERS, 1997, 4 (08) : 221 - 223
[26] Generating Database Access Code From Domain Models
Khelifi, Nassima Yamouni
Smialek, Michal
Mekki, Rachida
PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 991 - 996
[27] Quantifying Domain Knowledge in Large Language Models
Sayenju, Sudhashree
Aygun, Ramazan
Franks, Bill
Johnston, Sereres
Lee, George
Choi, Hansook
Modgil, Girish
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 193 - 194
[28] Using AI-Based Code Completion for Domain-Specific Languages
Piereder, Christina
Fleck, Guenter
Geist, Verena
Moser, Michael
Pichler, Josef
PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2023, PT I, 2024, 14483 : 227 - 242
[29] An Empirical Investigation on the Performance of Domain Adaptation for T5 Code Completion
Fukumoto, Daisuke
Kashiwa, Yutaro
Hirao, Toshiki
Fujiwara, Kenji
Iida, Hajimu
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 693 - 697
[30] RASCAL: a Domain Specific Language for Source Code Analysis and Manipulation
Klint, Paul
van der Storm, Tijs
Vinju, Jurgen
2009 NINTH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 2009, : 168 - +

← 1 2 3 4 5 →