Analysis of large-language model versus human performance for genetics questions

被引:52
|
作者
Duong, Dat [1 ]
Solomon, Benjamin D. [1 ]
机构
[1] Natl Human Genome Res Inst, Med Genet Branch, Med Genom Unit, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/s41431-023-01396-8
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-language models like ChatGPT have recently received a great deal of attention. One area of interest pertains to how these models could be used in biomedical contexts, including related to human genetics. To assess one facet of this, we compared the performance of ChatGPT versus human respondents (13,642 human responses) in answering 85 multiple-choice questions about aspects of human genetics. Overall, ChatGPT did not perform significantly differently (p = 0.8327) than human respondents; ChatGPT was 68.2% accurate, compared to 66.6% accuracy for human respondents. Both ChatGPT and humans performed better on memorization-type questions versus critical thinking questions (p < 0.0001). When asked the same question multiple times, ChatGPT frequently provided different answers (16% of initial responses), including for both initially correct and incorrect answers, and gave plausible explanations for both correct and incorrect answers. ChatGPT's performance was impressive, but currently demonstrates significant shortcomings for clinical or other high-stakes use. Addressing these limitations will be important to guide adoption in real-life situations.
引用
收藏
页码:466 / 468
页数:3
相关论文
共 50 条
  • [1] Analysis of large-language model versus human performance for genetics questions
    Dat Duong
    Benjamin D. Solomon
    [J]. European Journal of Human Genetics, 2024, 32 : 466 - 468
  • [2] Response to correspondence regarding “Analysis of large-language model versus human performance for genetics questions”
    Dat Duong
    Benjamin D. Solomon
    [J]. European Journal of Human Genetics, 2024, 32 : 379 - 380
  • [3] Performance of a Large-Language Model in scoring construction management capstone design projects
    Castelblanco, Gabriel
    Cruz-Castro, Laura
    Yang, Zhenlin
    [J]. COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2024,
  • [4] Understanding Large-Language Model (LLM)-powered Human-Robot Interaction
    Kim, Callie Y.
    Lee, Christine P.
    Mutlu, Bilge
    [J]. PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024, 2024, : 371 - 380
  • [5] Comparative Analysis of Multimodal Large Language Model Performance on Clinical Vignette Questions
    Han, Tianyu
    Adams, Lisa C.
    Bressem, Keno K.
    Busch, Felix
    Nebelung, Sven
    Truhn, Daniel
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (15): : 1320 - 1321
  • [6] Towards Large-Language Model Assisted Layout of Silicon Photonic Integrated Circuits
    Liu, Jason
    Sharma, Ankita
    Doumbia, Cheick
    Poon, Joyce K. S.
    [J]. 25TH EUROPEAN CONFERENCE ON INTEGRATED OPTICS, ECIO 2024, 2024, 402 : 441 - 447
  • [7] Performance of a Large Language Model on Practice Questions for the Neonatal Board Examination
    Beam, Kristyn
    Sharma, Puneet
    Kumar, Bhawesh
    Wang, Cindy
    Brodsky, Dara
    Martin, Camilia R.
    Beam, Andrew
    [J]. JAMA PEDIATRICS, 2023, 177 (09) : 977 - 979
  • [9] Performance of large language model artificial intelligence on dermatology board exam questions
    Park, Lily
    Ehlert, Brittany
    Susla, Lyudmyla
    Lum, Zachary C.
    Lee, Patrick K.
    [J]. CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2023, 49 (07) : 733 - 734
  • [10] Performance and risk of harm of a large language model on dermatology continuing medical education questions
    Chen, M. L.
    Cai, Z. Ran
    Kim, J.
    Novoa, R.
    Barnes, L. A.
    Beam, A.
    Linos, E.
    [J]. JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (08) : S25 - S25