Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

被引:1
|
作者
Tang, Ze [1 ]
Ge, Jidong [1 ]
Liu, Shangqing [2 ]
Zhu, Tingwei [1 ]
Xu, Tongtong [3 ]
Huang, Liguo [4 ]
Luo, Bin [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanyang Technol Univ, Singapore, Singapore
[3] Huawei Software Engn Applicat Technol, Shenzhen, Peoples R China
[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA
关键词
domain adaptive code completion; retrieval-augment language model;
D O I
10.1109/ASE56229.2023.00076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.
引用
收藏
页码:421 / 433
页数:13
相关论文
共 50 条
  • [1] Code Completion with Statistical Language Models
    Raychev, Veselin
    Vechev, Martin
    Yahav, Eran
    ACM SIGPLAN NOTICES, 2014, 49 (06) : 419 - 428
  • [2] Adaptive Language Modeling with a Set of Domain Dependent Models
    Shi, Yangyang
    Wiggers, Pascal
    Jonker, Catholijn M.
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 472 - 479
  • [3] On the Effectiveness of Large Language Models in Domain-Specific Code Generation
    Gu, Xiaodong
    Chen, Meng
    Lin, Yalan
    Hu, Yuhan
    Zhang, Hongyu
    Wan, Chengcheng
    Wei, Zhao
    Xu, Yong
    Wang, Juhong
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (03)
  • [4] DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
    Lai, Xin
    Tian, Zhuotao
    Xu, Xiaogang
    Chen, Yingcong
    Liu, Shu
    Zhao, Hengshuang
    Wang, Liwei
    Jia, Jiaya
    COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 : 369 - 387
  • [5] Rapid Realization of Executable Domain Models via Automatic Code Generation
    Wang, Bo
    Rosenberg, Doug
    Boehm, Barry W.
    2017 IEEE 28TH ANNUAL SOFTWARE TECHNOLOGY CONFERENCE (STC), 2017,
  • [6] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
    Ren, Qibing
    Gao, Chang
    Shao, Jing
    Yan, Junchi
    Tan, Xin
    Lam, Wai
    Ma, Lizhuang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11437 - 11452
  • [7] MPIRIGEN: MPI Code Generation through Domain-Specific Language Models
    Schneider, Nadav
    Hasabnis, Niranjan
    Vo, Vy A.
    Kadosh, Tal
    Krien, Neva
    Capota, Mihai
    Tamir, Guy
    Willke, Ted
    Ahmed, Nesreen
    Pinter, Yuval
    Mattson, Timothy
    Oren, Gal
    PROCEEDINGS OF THE 2024 ON WORKSHOP ON AI FOR SYSTEMS, AI4SYS 2024, 2024, : 1 - 6
  • [8] Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models
    Lamproudis, Anastasios
    Henriksson, Aron
    Dalianis, Hercules
    HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 180 - 188
  • [9] Predicting Email and Article Clickthroughs with Domain-adaptive Language Models
    Jaidka, Kokil
    Goyal, Tanya
    Chhaya, Niyati
    WEBSCI'18: PROCEEDINGS OF THE 10TH ACM CONFERENCE ON WEB SCIENCE, 2018, : 177 - 184
  • [10] Domain-specific language for infrastructure as code
    Shvetcova, Valeriya
    Borisenko, Oleg
    Polischuk, Maxim
    2019 IVANNIKOV MEMORIAL WORKSHOP (IVMEM 2019), 2019, : 39 - 45