Creating Thorough Tests for AI-Generated Code is Hard

被引:0
|
作者
Singhal, Shreya [1 ]
Kumar, Viraj [2 ]
机构
[1] Indian Inst Technol Madras, Chennai, Tamil Nadu, India
[2] Indian Inst Sci, Bangalore, Karnataka, India
关键词
D O I
10.1145/3627217.3627238
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Before implementing a function, programmers are encouraged to write a suite of test cases that specify its intended behaviour on several inputs. A suite of tests is thorough if any buggy implementation fails at least one of these tests. We posit that as the proportion of code generated by Large Language Models (LLMs) grows, so must the ability of students to create test suites that are thorough enough to detect subtle bugs in such code. Our paper makes two contributions. First, we demonstrate how difficult it can be to create thorough tests for LLM-generated code by evaluating 27 test suites from a public dataset (EvalPlus). Second, by identifying deficiencies in these test suites, we propose strategies for improving the ability of students to develop thorough test suites for LLM-generated code.
引用
收藏
页码:108 / 111
页数:4
相关论文
共 50 条
  • [41] Auto articles: an experiment in AI-generated content
    Catherine Armitage
    Markus Kaindl
    Nature, 2020, 588 (7837) : S138 - S141
  • [42] Towards Detection of AI-Generated Texts and Misinformation
    Najee-Ullah, Ahmad
    Landeros, Luis
    Balytskyi, Yaroslav
    Chang, Sang-Yoon
    SOCIO-TECHNICAL ASPECTS IN SECURITY, STAST 2021, 2022, 13176 : 194 - 205
  • [43] AI-Generated Media for Exploring Alternate Realities
    Dunnell, Kevin
    Agarwal, Gauri
    Pataranutaporn, Pat
    Lippman, Andrew
    Maes, Pattie
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [44] AI-Generated Clinical Summaries-Reply
    Goodman, Katherine E.
    Morgan, Daniel J.
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (22):
  • [45] Appeal and quality assessment for AI-generated images
    Goering, Steve
    Rao, Rakesh Ramachandra Rao
    Merten, Rasmus
    Raake, Alexander
    2023 15TH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE, QOMEX, 2023, : 115 - 118
  • [46] Human heuristics for AI-generated language are flawed
    Jakesch, Maurice
    Hancock, Jeffrey T.
    Naaman, Mor
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (11)
  • [47] Fact-Checking of AI-Generated Reports
    Mahmood, Razi
    Wang, Ge
    Kalra, Mannudeep
    Yan, Pingkun
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT II, 2024, 14349 : 214 - 223
  • [48] Learning to Evaluate the Artness of AI-Generated Images
    Chen, Junyu
    An, Jie
    Lyu, Hanjia
    Kanan, Christopher
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10731 - 10740
  • [49] Testing of detection tools for AI-generated text
    Weber-Wulff, Debora
    Anohina-Naumeca, Alla
    Bjelobaba, Sonja
    Foltynek, Tomas
    Guerrero-Dib, Jean
    Popoola, Olumide
    Sigut, Petr
    Waddington, Lorna
    INTERNATIONAL JOURNAL FOR EDUCATIONAL INTEGRITY, 2023, 19 (01)
  • [50] Testing of detection tools for AI-generated text
    Debora Weber-Wulff
    Alla Anohina-Naumeca
    Sonja Bjelobaba
    Tomáš Foltýnek
    Jean Guerrero-Dib
    Olumide Popoola
    Petr Šigut
    Lorna Waddington
    International Journal for Educational Integrity, 19