On the Effectiveness of Large Language Models in Domain-Specific Code Generation

被引:1
|
作者
Gu, Xiaodong [1 ]
Chen, Meng [1 ]
Lin, Yalan [1 ]
Hu, Yuhan [1 ]
Zhang, Hongyu [2 ]
Wan, Chengcheng [3 ]
Wei, Zhao [4 ]
Xu, Yong [4 ]
Wang, Juhong [4 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Chongqing Univ, Chongqing, Peoples R China
[3] East China Normal Univ, Shanghai, Peoples R China
[4] Tencent Inc, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
large language models; code generation; domain-specific program generation;
D O I
10.1145/3697012
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite significant achievements, they rely on enormous training data to acquire a broad spectrum of open domain knowledge. Besides, their evaluation revolves around open-domain benchmarks like HumanEval, which primarily consist of programming contests. Therefore, it is hard to fully characterize the intricacies and challenges associated with particular domains (e.g., Web, game, and math). In this article, we conduct an in-depth study of the LLMs in domain-specific code generation. Our results demonstrate that LLMs exhibit sub-optimal performance in generating domain-specific code, due to their limited proficiency in utilizing domain-specific libraries. We further observe that incorporating API knowledge as prompts can empower LLMs to generate more professional code. Based on these findings, we further investigate how to effectively incorporate API knowledge into the code generation process. We experiment with three strategies for incorporating domain knowledge, namely, external knowledge inquirer, chain-of-thought prompting, and chain-of-thought fine-tuning. We refer to these strategies as a new code generation approach called DomCoder. Experimental results show that all strategies of DomCoder improve the effectiveness of domain-specific code generation under certain settings.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] SGL: A domain-specific language for large-scale analysis of open-source code
    Foo, Darius
    Yi, Ang Ming
    Yeo, Jason
    Sharma, Asankhaya
    2018 IEEE CYBERSECURITY DEVELOPMENT CONFERENCE (SECDEV 2018), 2018, : 61 - 68
  • [22] Hypnos: A domain-specific large language model for anesthesiology
    Wang, Zhonghai
    Jiang, Jie
    Zhan, Yibing
    Zhou, Bohao
    Li, Yanhong
    Zhang, Chong
    Yu, Baosheng
    Ding, Liang
    Jin, Hua
    Peng, Jun
    Lin, Xu
    Liu, Weifeng
    NEUROCOMPUTING, 2025, 624
  • [23] RUNTIME CODE GENERATION FOR INTERPRETED DOMAIN-SPECIFIC MODELING LANGUAGES
    Meyer, Tom
    Helms, Tobias
    Warnke, Tom
    Uhrmacher, Adelinde M.
    2018 WINTER SIMULATION CONFERENCE (WSC), 2018, : 605 - 615
  • [24] A Model Query Language for Domain-Specific Models 2020
    Guo, Jiangmin
    Lu, Jinzhi
    Ding, Jie
    Wang, Guoxin
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1203 - 1209
  • [25] Enhancing Synthetic Test Data Generation with Language Models Using a More Expressive Domain-Specific Language
    Tan, Chao
    Behjati, Razieh
    Arisholm, Erik
    TESTING SOFTWARE AND SYSTEMS, ICTSS 2023, 2023, 14131 : 21 - 39
  • [26] Augmenting Large Language Models via Vector Embeddings to Improve Domain-specific Responsiveness
    Wolfrath, Nathan M.
    Verhagen, Nathaniel B.
    Crotty, Bradley H.
    Somai, Melek
    Kothari, Anai N.
    JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2024, (214):
  • [27] Empowering Large Language Models to Leverage Domain-Specific Knowledge in E-Learning
    Lu, Ruei-Shan
    Lin, Ching-Chang
    Tsao, Hsiu-Yuan
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [28] PreparedLLM: effective pre-pretraining framework for domain-specific large language models
    Chen, Zhou
    Lin, Ming
    Wang, Zimeng
    Zang, Mingrun
    Bai, Yuqi
    BIG EARTH DATA, 2024, 8 (04) : 649 - 672
  • [29] Language Models Learning for Domain-Specific Natural Language User Interaction
    Bai, Shuanhu
    Huang, Chien-Lin
    Tan, Yeow-Kee
    Ma, Bin
    2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 2480 - 2485
  • [30] Grammar-driven generation of domain-specific language tools
    Wu, Hui
    Proc Conf Object Orient Program Syst Lang Appl OOPSLA, 1600, (772-773):