Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

被引:1
|
作者
Tang, Ze [1 ]
Ge, Jidong [1 ]
Liu, Shangqing [2 ]
Zhu, Tingwei [1 ]
Xu, Tongtong [3 ]
Huang, Liguo [4 ]
Luo, Bin [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanyang Technol Univ, Singapore, Singapore
[3] Huawei Software Engn Applicat Technol, Shenzhen, Peoples R China
[4] Southern Methodist Univ, Dept Comp Sci, Dallas, TX USA
关键词
domain adaptive code completion; retrieval-augment language model;
D O I
10.1109/ASE56229.2023.00076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.
引用
收藏
页码:421 / 433
页数:13
相关论文
共 50 条
  • [21] Malicious Domain Detection via Domain Relationship and Graph Models
    He, Wenxuan
    Gou, Gaopeng
    Kang, Cuicui
    Liu, Chang
    Li, Zhen
    Xiong, Gang
    2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2019,
  • [22] Structure-Decoupled Adaptive Part Alignment Network for Domain Adaptive Mitochondria Segmentation
    Sun, Rui
    Mai, Huayu
    Luo, Naisong
    Zhang, Tianzhu
    Xiong, Zhiwei
    Wu, Feng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 523 - 533
  • [23] Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications
    Huang, Dingyun
    Cole, Jacqueline M.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025, 65 (05) : 2476 - 2486
  • [24] Joint Spoken Language Understanding and Domain Adaptive Language Modeling
    Zhang, Huifeng
    Zhu, Su
    Fan, Shuai
    Yu, Kai
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 311 - 324
  • [25] Using out-of-domain data to improve on-domain language models
    Iyer, R
    Ostendorf, M
    Gish, H
    IEEE SIGNAL PROCESSING LETTERS, 1997, 4 (08) : 221 - 223
  • [26] Generating Database Access Code From Domain Models
    Khelifi, Nassima Yamouni
    Smialek, Michal
    Mekki, Rachida
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 991 - 996
  • [27] Quantifying Domain Knowledge in Large Language Models
    Sayenju, Sudhashree
    Aygun, Ramazan
    Franks, Bill
    Johnston, Sereres
    Lee, George
    Choi, Hansook
    Modgil, Girish
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 193 - 194
  • [28] Using AI-Based Code Completion for Domain-Specific Languages
    Piereder, Christina
    Fleck, Guenter
    Geist, Verena
    Moser, Michael
    Pichler, Josef
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2023, PT I, 2024, 14483 : 227 - 242
  • [29] An Empirical Investigation on the Performance of Domain Adaptation for T5 Code Completion
    Fukumoto, Daisuke
    Kashiwa, Yutaro
    Hirao, Toshiki
    Fujiwara, Kenji
    Iida, Hajimu
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 693 - 697
  • [30] RASCAL: a Domain Specific Language for Source Code Analysis and Manipulation
    Klint, Paul
    van der Storm, Tijs
    Vinju, Jurgen
    2009 NINTH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 2009, : 168 - +