Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

被引：1

作者：

Tang, Ze ^{[1
]}

Ge, Jidong ^{[1
]}

Liu, Shangqing ^{[2
]}

Zhu, Tingwei ^{[1
]}

Xu, Tongtong ^{[3
]}

Huang, Liguo ^{[4
]}

Luo, Bin ^{[1
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Nanyang Technol Univ, Singapore, Singapore

[3] Huawei Software Engn Applicat Technol, Shenzhen, Peoples R China

[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA

来源：

2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE | 2023年

关键词：

domain adaptive code completion; retrieval-augment language model;

D O I：

10.1109/ASE56229.2023.00076

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.

引用

页码：421 / 433

页数：13

共 50 条

[31] Cross-Domain Tibetan Named Entity Recognition via Large Language Models
Zhang, Jin
Gao, Fan
Yeshi, Lobsang
Tashi, Dorje
Wang, Xiangshi
Tashi, Nyima
Luosang, Gadeng
ELECTRONICS, 2025, 14 (01):
[32] An Evaluation of Domain-Specific Language Technologies for Code Generation
Schmittt, Christian
Kuckuk, Sebastian
Kostlert, Harald
Hannig, Frank
Teich, Jurgen
2014 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ITS APPLICATIONS (ICCSA), 2014, : 18 - 26
[33] Domain Adaptive Detector via Variational Inference
Kim, Hwa-Rang
Kim, Kwang-Ju
Choi, Doo-Hyun
2022 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON22), 2022, : 86 - 91
[34] Machine Translation Based on Domain Adaptive Language Model
Li, Lingling
Chen, Xianlong
Xu, Yiling
2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 116 - 120
[35] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Wang, Boxin
Ping, Wei
Xiao, Chaowei
Xu, Peng
Patwary, Mostofa
Shoeybi, Mohammad
Li, Bo
Anandkumar, Anima
Catanzaro, Bryan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[36] Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation
Wang, Ke
Chen, Guandan
Huang, Zhongqiang
Wan, Xiaojun
Huang, Fei
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13970 - 13978
[37] Towards Domain-Agnostic and Domain-Adaptive Dementia Detection from Spoken Language
Farzana, Shahla
Parde, Natalie
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11965 - 11978
[38] Domain Adaptive Cross-Modal Image Retrieval via Modality and Domain Translations
Yanagi, Rintaro
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2021, E104A (06) : 866 - 875
[39] Cross-Domain Palmprint Recognition via Regularized Adversarial Domain Adaptive Hashing
Du, Xuefeng
Zhong, Dexing
Shao, Huikai
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2372 - 2385
[40] AdaComplete: improve DL-based code completion method’s domain adaptability
Zejun Wang
Fang Liu
Yiyang Hao
Zhi Jin
Automated Software Engineering, 2023, 30

← 1 2 3 4 5 →