Artificial intelligence large language model scores highly on focused practice designation in metabolic and bariatric surgery board practice questions

被引:0
|
作者
Sanders, A. [1 ,4 ]
Lim, R. [2 ]
Jones, D. [3 ]
Vosburg, R. W. [4 ]
机构
[1] Beth Israel Deaconess Med Ctr, Dept Surg, Boston, MA USA
[2] Atrium Hlth, Charlotte, NC USA
[3] Rutgers New Jersey Med Sch, Dept Surg, Newark, NJ USA
[4] Grand Strand Med Ctr, Dept Surg, Myrtle Beach, SC 29572 USA
关键词
Artificial intelligence; AI; ChatGPT; Metabolic and bariatric surgery; Exam; PERFORMANCE; GPT-4;
D O I
10.1007/s00464-024-11267-y
中图分类号
R61 [外科手术学];
学科分类号
摘要
BackgroundArtificial intelligence models such as ChatGPT (Open AI) have performed well on the exams of various medical and surgical fields. It is not yet known how ChatGPT performs on similar metabolic and bariatric surgery (MBS) questions.ObjectiveAssess the performance of ChatGPT on Focused Practice Designation in Metabolic and Bariatric Surgery board-style questions.SettingUnited States.MethodsQuestions obtained from the largest commercially available bank of FPD-MBS practice questions were entered into ChatGPT-4, as is, without prior training. We assessed the overall percentage correct as well as the percentage correct within each of the five American Board of Surgery (ABS) question categories. One-way ANOVA was used to determine if the frequency of correct answers differed between categories.ResultsOut of 255 questions, ChatGPT-4 correctly answered 189 (74.1%). Between the five question categories there was no difference between the frequency of correct answers (p = 0.22). It did not matter if questions were entered individually or in groups of up to 10.ConclusionWithout prior training, ChatGPT-4 scored highly when evaluated on the largest practice question bank for the FPD-MBS exam.
引用
收藏
页码:6678 / 6681
页数:4
相关论文
共 26 条
  • [21] Would Uro_Chat, a Newly Developed Generative Artificial Intelligence Large Language Model, Have Successfully Passed the In-Service Assessment Questions of the European Board of Urology in 2022?
    May, Matthias
    Koerner-Riffard, Katharina
    Marszalek, Martin
    Eredics, Klaus
    EUROPEAN UROLOGY ONCOLOGY, 2024, 7 (01): : 155 - 156
  • [22] Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study
    Ye, Carrie
    Zweck, Elric
    Ma, Zechen
    Smith, Justin
    Katz, Steven
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 (03) : 479 - 484
  • [23] Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure
    Kozaily, Elie
    Geagea, Mabelissa
    Akdogan, Ecem R.
    Atkins, Jessica
    Elshazly, Mohamed B.
    Guglin, Maya
    Tedford, Ryan J.
    Wehbe, Ramsey M.
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2024, 408
  • [24] Doctor versus artificial intelligence: patient and physician evaluation of large language model responses to rheumatology patient questions: comment on the article by Ye et al
    Wang, Gang
    Zhuo, Ning
    Liu, Zhichun
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 (06) : 984 - 984
  • [25] Comment on "Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure: Doi: 10.1016/j. ijcard.2024.132115 "
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2024, 410
  • [26] Schwerpunkt künstliche Intelligenz in der Medizin – rechtliche Aspekte bei der Nutzung großer Sprachmodelle im klinischen AlltagFocus: artificial intelligence in medicine—Legal aspects of using large language models in clinical practice
    Eva Weicken
    Mirja Mittermaier
    Thomas Hoeren
    Juliana Kliesch
    Thomas Wiegand
    Martin Witzenrath
    Miriam Ballhausen
    Christian Karagiannidis
    Leif Erik Sander
    Matthias I. Gröschel
    Die Innere Medizin, 2025, 66 (4) : 436 - 441