Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

被引:1
|
作者
Tang, Ze [1 ]
Ge, Jidong [1 ]
Liu, Shangqing [2 ]
Zhu, Tingwei [1 ]
Xu, Tongtong [3 ]
Huang, Liguo [4 ]
Luo, Bin [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanyang Technol Univ, Singapore, Singapore
[3] Huawei Software Engn Applicat Technol, Shenzhen, Peoples R China
[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA
关键词
domain adaptive code completion; retrieval-augment language model;
D O I
10.1109/ASE56229.2023.00076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.
引用
收藏
页码:421 / 433
页数:13
相关论文
共 50 条
  • [31] Cross-Domain Tibetan Named Entity Recognition via Large Language Models
    Zhang, Jin
    Gao, Fan
    Yeshi, Lobsang
    Tashi, Dorje
    Wang, Xiangshi
    Tashi, Nyima
    Luosang, Gadeng
    ELECTRONICS, 2025, 14 (01):
  • [32] An Evaluation of Domain-Specific Language Technologies for Code Generation
    Schmittt, Christian
    Kuckuk, Sebastian
    Kostlert, Harald
    Hannig, Frank
    Teich, Jurgen
    2014 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ITS APPLICATIONS (ICCSA), 2014, : 18 - 26
  • [33] Domain Adaptive Detector via Variational Inference
    Kim, Hwa-Rang
    Kim, Kwang-Ju
    Choi, Doo-Hyun
    2022 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON22), 2022, : 86 - 91
  • [34] Machine Translation Based on Domain Adaptive Language Model
    Li, Lingling
    Chen, Xianlong
    Xu, Yiling
    2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 116 - 120
  • [35] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
    Wang, Boxin
    Ping, Wei
    Xiao, Chaowei
    Xu, Peng
    Patwary, Mostofa
    Shoeybi, Mohammad
    Li, Bo
    Anandkumar, Anima
    Catanzaro, Bryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation
    Wang, Ke
    Chen, Guandan
    Huang, Zhongqiang
    Wan, Xiaojun
    Huang, Fei
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13970 - 13978
  • [37] Towards Domain-Agnostic and Domain-Adaptive Dementia Detection from Spoken Language
    Farzana, Shahla
    Parde, Natalie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11965 - 11978
  • [38] Domain Adaptive Cross-Modal Image Retrieval via Modality and Domain Translations
    Yanagi, Rintaro
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2021, E104A (06) : 866 - 875
  • [39] Cross-Domain Palmprint Recognition via Regularized Adversarial Domain Adaptive Hashing
    Du, Xuefeng
    Zhong, Dexing
    Shao, Huikai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2372 - 2385
  • [40] AdaComplete: improve DL-based code completion method’s domain adaptability
    Zejun Wang
    Fang Liu
    Yiyang Hao
    Zhi Jin
    Automated Software Engineering, 2023, 30