LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

被引：1

作者：

Fakhoury, Sarah ^{[1
]}

Naik, Aaditya ^{[2
]}

Sakkas, Georgios ^{[3
]}

Chakraborty, Saikat ^{[1
]}

Lahiri, Shuvendu K. ^{[1
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

[2] Univ Penn, Philadelphia, PA 19104 USA

[3] Univ Calif San Diego, San Diego, CA 92037 USA

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 09期

关键词：

Intent disambiguation; code generation; LLMs; human factors; cognitive load; test generation;

D O I：

10.1109/TSE.2024.3428972

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.

引用

页码：2254 / 2268

页数：15

共 39 条

[1] Test-Driven Code Review: An Empirical Study
Spadini, Davide
Palomba, Fabio
Baum, Tobias
Hanenberg, Stefan
Bruntink, Magiel
Bacchelli, Alberto
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 1061 - 1072
[2] ChatUniTest: A Framework for LLM-Based Test Generation
Chen, Yinghao
Hu, Zehao
Zhi, Chen
Han, Junxiao
Deng, Shuiguang
Yin, Jianwei
COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 572 - 576
[3] Effective Context Selection in LLM-Based Leaderboard Generation: An Empirical Study
Kabongo, Salomon
D'Souza, Jennifer
Auer, Soren
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 150 - 160
[4] Boosting LLM-Based Software Generation by Aligning Code with Requirements
Yaacov, Tom
Elyasaf, Achiya
Weiss, Gera
32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 301 - 305
[5] LLM-based Control Code Generation using Image Recognition
Koziolek, Heiko
Koziolek, Anne
2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 38 - 45
[6] LLM-based and Retrieval-Augmented Control Code Generation
Koziolek, Heiko
Gruener, Sten
Hark, Rhaban
Ashiwal, Virendra
Linsbauer, Sofia
Eskandani, Nafise
2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 22 - 29
[7] LLM-Based Code Generation Method for Golang Compiler Testing
Gu, Qiuhan
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2201 - 2203
[8] ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance
Hayet, Ishrak
Scott, Adam
d'Amorim, Marcelo
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (01) : 305 - 319
[9] Towards empirical evaluation of Test-Driven Development in a university environment
Pancur, M
Ciglaric, M
Trampus, M
Vidmar, T
IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 83 - 86
[10] Test-Driven Development of Graphical User Interfaces: A Pilot Evaluation
Hellmann, Theodore D.
Hosseini-Khayat, Ali
Maurer, Frank
AGILE PROCESSES IN SOFTWARE ENGINEERING AND EXTREME PROGRAMMING, 2011, 77 : 223 - 237

← 1 2 3 4 →