ChatGPT in Undergraduate Education: Performance of GPT-3.5 and Identification of AI-Generated Text in Introductory Neuroscience

被引:0
|
作者
Covington, Natalie V. [1 ,2 ]
Vruwink, Olivia [1 ]
机构
[1] Univ Minnesota, Dept Speech Language Hearing Sci, Minneapolis, MN 55455 USA
[2] Allina Hlth, Courage Kenny Rehabil Inst, Minneapolis, MN 55406 USA
关键词
Artificial intelligence; Large language models; Undergraduate education; Neuroscience;
D O I
10.1007/s40593-024-00427-9
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
ChatGPT and other large language models (LLMs) have the potential to significantly disrupt common educational practices and assessments, given their capability to quickly generate human-like text in response to user prompts. LLMs GPT-3.5 and GPT-4 have been tested against many standardized and high-stakes assessment materials (e.g. SAT, Uniform Bar Exam, GRE), demonstrating impressive but variable performance. Fewer studies have examined the performance of ChatGPT on course-level educational materials in ecologically-valid grading contexts. Here, we examine the performance of GPT-3.5 on undergraduate course materials and assess the ability of teaching assistants to identify AI-generated responses interleaved with student work. GPT-3.5 was prompted to respond to questions drawn from undergraduate neuroscience assessments. These AI-generated responses were interleaved with student-authored responses and graded blindly using existing course rubrics. In addition, a subset of responses were rated for their humanlikeness by teaching assistants who were blind to author status (AI vs. student). In general, GPT-3.5 performed within one standard deviation of the class average, but there were cases in which ChatGPT-generated responses substantially outperformed or underperformed relative to student responses. Teaching assistants who were blind to author status were able to identify ChatGPT-generated responses with better than chance accuracy, and those with personal experience using ChatGPT were significantly more accurate than those without ChatGPT experience. Despite high levels of identification accuracy, none of the teaching assistant raters endorsed sufficient confidence in their identifications to support reporting the response as an instance of academic dishonesty in a real-world classroom setting.
引用
收藏
页数:24
相关论文
共 13 条
  • [1] The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education
    Rizzo, Michael G.
    Cai, Nathan
    Constantinescu, David
    [J]. JOURNAL OF ORTHOPAEDICS, 2024, 50 : 70 - 75
  • [2] How persuasive is AI-generated argumentation? An analysis of the quality of an argumentative text produced by the GPT-3 AI text generator
    Hinton, Martin
    Wagemans, Jean H. M.
    [J]. ARGUMENT & COMPUTATION, 2023, 14 (01) : 59 - 74
  • [3] AI-generated text in otolaryngology publications: a comparative analysis before and after the release of ChatGPT.
    Carnino, Jonathan M.
    Chong, Nicholas Y. K.
    Bayly, Henry
    Salvati, Lindsay R.
    Tiwana, Hardeep S.
    Levi, Jessica R.
    [J]. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024,
  • [4] Comment on: ‘Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination’ and ‘ChatGPT in ophthalmology: the dawn of a new era?’
    Nima Ghadiri
    [J]. Eye, 2024, 38 : 654 - 655
  • [6] A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education
    Doughty, Jacob
    Wan, Zipiao
    Bompelli, Anishka
    Qayum, Jubahed
    Wang, Taozhi
    Zhang, Juran
    Zheng, Yujia
    Doyle, Aidan
    Sridhar, Pragnya
    Agarwal, Arav
    Bogart, Christopher
    Keylor, Eric
    Kultur, Can
    Savelka, Jaromir
    Sakr, Majd
    [J]. PROCEEDINGS OF THE 26TH AUSTRALASIAN COMPUTING EDUCATION CONFERENCE, ACE 2024, 2024, : 114 - 123
  • [7] Evaluating prompt engineering on GPT-3.5's performance in USMLE-style medical calculations and clinical scenarios generated by GPT-4
    Patel, Dhavalkumar
    Raut, Ganesh
    Zimlichman, Eyal
    Cheetirala, Satya Narayan
    Nadkarni, Girish N.
    Glicksberg, Benjamin S.
    Apakama, Donald U.
    Bell, Elijah J.
    Freeman, Robert
    Timsina, Prem
    Klang, Eyal
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [8] Exploring the Potential of ChatGPT in Nursing Education: A Comparative Analysis of Human and AI-Generated NCLEX Questions
    Cox, Rachel
    Hunt, Karen
    Hill, Rebecca
    [J]. NURSING RESEARCH, 2024, 73 (03) : E75 - E75
  • [9] Letting ChatGPT do your science is fraudulent (and a bad idea), but AI-generated text can enhance inclusiveness in publishing
    Sinclair, Brent J.
    [J]. CURRENT RESEARCH IN INSECT SCIENCE, 2023, 3
  • [10] AI-Generated Graduate Medical Education Content for Total Joint Arthroplasty: Comparing ChatGPT Against Orthopaedic Fellows
    DeCook, Ryan
    Muffly, Brian T.
    Mahmood, Sania
    Holland, Christopher T.
    Ayeni, Ayomide M.
    Ast, Michael P.
    Bolognese, Michael P.
    Guild, George N.
    Sheth, Neil P.
    Pean, Christian A.
    Premkumar, Ajay
    [J]. ARTHROPLASTY TODAY, 2024, 27