LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

被引:1
|
作者
Fakhoury, Sarah [1 ]
Naik, Aaditya [2 ]
Sakkas, Georgios [3 ]
Chakraborty, Saikat [1 ]
Lahiri, Shuvendu K. [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
[3] Univ Calif San Diego, San Diego, CA 92037 USA
关键词
Intent disambiguation; code generation; LLMs; human factors; cognitive load; test generation;
D O I
10.1109/TSE.2024.3428972
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.
引用
收藏
页码:2254 / 2268
页数:15
相关论文
共 39 条
  • [1] Test-Driven Code Review: An Empirical Study
    Spadini, Davide
    Palomba, Fabio
    Baum, Tobias
    Hanenberg, Stefan
    Bruntink, Magiel
    Bacchelli, Alberto
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 1061 - 1072
  • [2] ChatUniTest: A Framework for LLM-Based Test Generation
    Chen, Yinghao
    Hu, Zehao
    Zhi, Chen
    Han, Junxiao
    Deng, Shuiguang
    Yin, Jianwei
    COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 572 - 576
  • [3] Effective Context Selection in LLM-Based Leaderboard Generation: An Empirical Study
    Kabongo, Salomon
    D'Souza, Jennifer
    Auer, Soren
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 150 - 160
  • [4] Boosting LLM-Based Software Generation by Aligning Code with Requirements
    Yaacov, Tom
    Elyasaf, Achiya
    Weiss, Gera
    32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 301 - 305
  • [5] LLM-based Control Code Generation using Image Recognition
    Koziolek, Heiko
    Koziolek, Anne
    2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 38 - 45
  • [6] LLM-based and Retrieval-Augmented Control Code Generation
    Koziolek, Heiko
    Gruener, Sten
    Hark, Rhaban
    Ashiwal, Virendra
    Linsbauer, Sofia
    Eskandani, Nafise
    2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 22 - 29
  • [7] LLM-Based Code Generation Method for Golang Compiler Testing
    Gu, Qiuhan
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 2201 - 2203
  • [8] ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance
    Hayet, Ishrak
    Scott, Adam
    d'Amorim, Marcelo
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (01) : 305 - 319
  • [9] Towards empirical evaluation of Test-Driven Development in a university environment
    Pancur, M
    Ciglaric, M
    Trampus, M
    Vidmar, T
    IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 83 - 86
  • [10] Test-Driven Development of Graphical User Interfaces: A Pilot Evaluation
    Hellmann, Theodore D.
    Hosseini-Khayat, Ali
    Maurer, Frank
    AGILE PROCESSES IN SOFTWARE ENGINEERING AND EXTREME PROGRAMMING, 2011, 77 : 223 - 237