LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

被引：1

作者：

Fakhoury, Sarah ^{[1
]}

Naik, Aaditya ^{[2
]}

Sakkas, Georgios ^{[3
]}

Chakraborty, Saikat ^{[1
]}

Lahiri, Shuvendu K. ^{[1
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

[2] Univ Penn, Philadelphia, PA 19104 USA

[3] Univ Calif San Diego, San Diego, CA 92037 USA

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 09期

关键词：

Intent disambiguation; code generation; LLMs; human factors; cognitive load; test generation;

D O I：

10.1109/TSE.2024.3428972

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.

引用

页码：2254 / 2268

页数：15

共 39 条

[21] Managing Linux servers with LLM-based AI agents: An empirical evaluation with GPT4
Cao, Charles
Wang, Feiyi
Lindley, Lisa
Wang, Zejiang
MACHINE LEARNING WITH APPLICATIONS, 2024, 17
[22] The impact of test-driven development on software development productivity - An empirical study
Madeyski, Lech
Szala, Lukasz
SOFTWARE PROCESS IMPROVEMENT, PROCEEDINGS, 2007, 4764 : 200 - +
[23] Enhancing Acceptance Test-Driven Development with Model-based Test Generation
Ramler, Rudolf
Klammer, Claus
2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 503 - 504
[24] Using Thesaurus-Based Tag Clouds to Improve Test-Driven Code Search
Lazzarini Lemos, Otavio Augusto
de Paula, Adriano Carvalho
Konishi, Gustavo
Ossher, Joel
Bajracharya, Sushil
Lopes, Cristina
7TH BRAZILIAN SYMPOSIUM ON SOFTWARE COMPONENTS, ARCHITECTURES AND REUSE (SBCARS 2013), 2013, : 99 - 108
[25] An empirical study on LLM-based classification of requirements-related provisions in food-safety regulations
Hassani, Shabnam
Sabetzadeh, Mehrdad
Amyot, Daniel
EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)
[26] THE EMPIRICAL STUDY: ENCOURAGING STUDENTS' INTEREST IN SOFTWARE DEVELOPMENT USING TEST-DRIVEN DEVELOPMENT
Nanthaamornphong, Aziz
Bressan, Stephane
TEHNICKI GLASNIK-TECHNICAL JOURNAL, 2019, 13 (04): : 267 - 274
[27] Continuous Test-Driven Development: A Preliminary Empirical Evaluation Using Agile Experimentation in Industrial Settings
Madeyski, Lech
Kawalerowicz, Marcin
TOWARDS A SYNERGISTIC COMBINATION OF RESEARCH AND PRACTICE IN SOFTWARE ENGINEERING, 2018, 733 : 105 - 118
[28] Does test-driven development improve the program code? Alarming results from a comparative case study
Siniaalto, Maria
Abrahamsson, Pekka
BALANCING AGILITY AND FORMALISM IN SOFTWARE ENGINEERING, 2008, 5082 : 143 - +
[29] An Empirical Study of Code Smells in Transformer-based Code Generation Techniques
Siddiq, Mohammed Latif
Majumder, Shafayat H.
Mim, Maisha R.
Jajodia, Sourov
Santos, Joanna C. S.
2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 71 - 82
[30] An Empirical Study on the Impact of Aspect-oriented Model-driven Code Generation
Menolli, Andre
Melo, Luan de Souza
Arimoto, Mauricio Massaru
Malucelli, Andreia
ICEIS: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 2, 2021, : 275 - 282

← 1 2 3 4 →