Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

被引：1

作者：

Tang, Ze ^{[1
]}

Ge, Jidong ^{[1
]}

Liu, Shangqing ^{[2
]}

Zhu, Tingwei ^{[1
]}

Xu, Tongtong ^{[3
]}

Huang, Liguo ^{[4
]}

Luo, Bin ^{[1
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Nanyang Technol Univ, Singapore, Singapore

[3] Huawei Software Engn Applicat Technol, Shenzhen, Peoples R China

[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA

来源：

2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE | 2023年

关键词：

domain adaptive code completion; retrieval-augment language model;

D O I：

10.1109/ASE56229.2023.00076

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.

引用

页码：421 / 433

页数：13

共 50 条

[1] Code Completion with Statistical Language Models
Raychev, Veselin
Vechev, Martin
Yahav, Eran
ACM SIGPLAN NOTICES, 2014, 49 (06) : 419 - 428
[2] Adaptive Language Modeling with a Set of Domain Dependent Models
Shi, Yangyang
Wiggers, Pascal
Jonker, Catholijn M.
TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 472 - 479
[3] On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Gu, Xiaodong
Chen, Meng
Lin, Yalan
Hu, Yuhan
Zhang, Hongyu
Wan, Chengcheng
Wei, Zhao
Xu, Yong
Wang, Juhong
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (03)
[4] DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
Lai, Xin
Tian, Zhuotao
Xu, Xiaogang
Chen, Yingcong
Liu, Shu
Zhao, Hengshuang
Wang, Liwei
Jia, Jiaya
COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 : 369 - 387
[5] Rapid Realization of Executable Domain Models via Automatic Code Generation
Wang, Bo
Rosenberg, Doug
Boehm, Barry W.
2017 IEEE 28TH ANNUAL SOFTWARE TECHNOLOGY CONFERENCE (STC), 2017,
[6] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Ren, Qibing
Gao, Chang
Shao, Jing
Yan, Junchi
Tan, Xin
Lam, Wai
Ma, Lizhuang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11437 - 11452
[7] MPIRIGEN: MPI Code Generation through Domain-Specific Language Models
Schneider, Nadav
Hasabnis, Niranjan
Vo, Vy A.
Kadosh, Tal
Krien, Neva
Capota, Mihai
Tamir, Guy
Willke, Ted
Ahmed, Nesreen
Pinter, Yuval
Mattson, Timothy
Oren, Gal
PROCEEDINGS OF THE 2024 ON WORKSHOP ON AI FOR SYSTEMS, AI4SYS 2024, 2024, : 1 - 6
[8] Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models
Lamproudis, Anastasios
Henriksson, Aron
Dalianis, Hercules
HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 180 - 188
[9] Predicting Email and Article Clickthroughs with Domain-adaptive Language Models
Jaidka, Kokil
Goyal, Tanya
Chhaya, Niyati
WEBSCI'18: PROCEEDINGS OF THE 10TH ACM CONFERENCE ON WEB SCIENCE, 2018, : 177 - 184
[10] Domain-specific language for infrastructure as code
Shvetcova, Valeriya
Borisenko, Oleg
Polischuk, Maxim
2019 IVANNIKOV MEMORIAL WORKSHOP (IVMEM 2019), 2019, : 39 - 45

← 1 2 3 4 5 →