Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation

被引:3
|
作者
Jin, Kailun [1 ]
Wang, Chung-Yu [1 ]
Hung Viet Pham [1 ]
Hemmati, Hadi [1 ]
机构
[1] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1145/3643991.3645074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.
引用
收藏
页码:167 / 171
页数:5
相关论文
共 50 条
  • [1] Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
    Liu, Jiawei
    Xia, Chunqiu Steven
    Wang, Yuyao
    Zhang, Lingming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] ChatGPT on ECT Can Large Language Models Support Psychoeducation?
    Lundin, Robert M.
    Berk, Michael
    Ostergaard, Soren Dinesen
    JOURNAL OF ECT, 2023, 39 (03) : 130 - 133
  • [3] Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models
    Black, Gavin S.
    Rimal, Bhaskar P.
    Vaidyan, Varghese Mathew
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 419 - 430
  • [4] Can ChatGPT Truly Overcome Other Large Language Models?
    Ray, Partha
    CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 429 - 429
  • [5] FormalEval: A Method for Automatic Evaluation of Code Generation via Large Language Models
    Yang, Sichao
    Yang, Ye
    2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 660 - 665
  • [6] An Empirical Evaluation of Large Language Models in Static Code Analysis for PHP Vulnerability Detection
    Cetin, Orcun
    Ekmekcioglu, Emre
    Arief, Budi
    Hernandez-Castro, Julio
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (09) : 1163 - 1183
  • [7] Can large language models generate geospatial code?
    State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
    不详
    arXiv, 1600,
  • [8] Can Large Language Models Write Parallel Code?
    Nichols, Daniel
    Davis, Joshua H.
    Xie, Zhaojun
    Rajaram, Arjun
    Bhatele, Abhinav
    PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,
  • [9] An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
    Schafer, Max
    Nadi, Sarah
    Eghbali, Aryaz
    Tip, Frank
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (01) : 85 - 105
  • [10] Evaluation of ChatGPT and Gemini large language models for pharmacometrics with NONMEM
    Shin, Euibeom
    Yu, Yifan
    Bies, Robert R.
    Ramanathan, Murali
    JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2024, 51 (03) : 187 - 197