LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

被引:1
|
作者
Fakhoury, Sarah [1 ]
Naik, Aaditya [2 ]
Sakkas, Georgios [3 ]
Chakraborty, Saikat [1 ]
Lahiri, Shuvendu K. [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
[3] Univ Calif San Diego, San Diego, CA 92037 USA
关键词
Intent disambiguation; code generation; LLMs; human factors; cognitive load; test generation;
D O I
10.1109/TSE.2024.3428972
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.
引用
收藏
页码:2254 / 2268
页数:15
相关论文
共 39 条
  • [21] Managing Linux servers with LLM-based AI agents: An empirical evaluation with GPT4
    Cao, Charles
    Wang, Feiyi
    Lindley, Lisa
    Wang, Zejiang
    MACHINE LEARNING WITH APPLICATIONS, 2024, 17
  • [22] The impact of test-driven development on software development productivity - An empirical study
    Madeyski, Lech
    Szala, Lukasz
    SOFTWARE PROCESS IMPROVEMENT, PROCEEDINGS, 2007, 4764 : 200 - +
  • [23] Enhancing Acceptance Test-Driven Development with Model-based Test Generation
    Ramler, Rudolf
    Klammer, Claus
    2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 503 - 504
  • [24] Using Thesaurus-Based Tag Clouds to Improve Test-Driven Code Search
    Lazzarini Lemos, Otavio Augusto
    de Paula, Adriano Carvalho
    Konishi, Gustavo
    Ossher, Joel
    Bajracharya, Sushil
    Lopes, Cristina
    7TH BRAZILIAN SYMPOSIUM ON SOFTWARE COMPONENTS, ARCHITECTURES AND REUSE (SBCARS 2013), 2013, : 99 - 108
  • [25] An empirical study on LLM-based classification of requirements-related provisions in food-safety regulations
    Hassani, Shabnam
    Sabetzadeh, Mehrdad
    Amyot, Daniel
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)
  • [26] THE EMPIRICAL STUDY: ENCOURAGING STUDENTS' INTEREST IN SOFTWARE DEVELOPMENT USING TEST-DRIVEN DEVELOPMENT
    Nanthaamornphong, Aziz
    Bressan, Stephane
    TEHNICKI GLASNIK-TECHNICAL JOURNAL, 2019, 13 (04): : 267 - 274
  • [27] Continuous Test-Driven Development: A Preliminary Empirical Evaluation Using Agile Experimentation in Industrial Settings
    Madeyski, Lech
    Kawalerowicz, Marcin
    TOWARDS A SYNERGISTIC COMBINATION OF RESEARCH AND PRACTICE IN SOFTWARE ENGINEERING, 2018, 733 : 105 - 118
  • [28] Does test-driven development improve the program code? Alarming results from a comparative case study
    Siniaalto, Maria
    Abrahamsson, Pekka
    BALANCING AGILITY AND FORMALISM IN SOFTWARE ENGINEERING, 2008, 5082 : 143 - +
  • [29] An Empirical Study of Code Smells in Transformer-based Code Generation Techniques
    Siddiq, Mohammed Latif
    Majumder, Shafayat H.
    Mim, Maisha R.
    Jajodia, Sourov
    Santos, Joanna C. S.
    2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 71 - 82
  • [30] An Empirical Study on the Impact of Aspect-oriented Model-driven Code Generation
    Menolli, Andre
    Melo, Luan de Souza
    Arimoto, Mauricio Massaru
    Malucelli, Andreia
    ICEIS: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 2, 2021, : 275 - 282