Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam?

被引:9
|
作者
Thibaut, Goetsch [1 ]
Dabbagh, Armaghan [2 ]
Liverneaux, Philippe [3 ,4 ]
机构
[1] Strasbourg Univ Hosp, Dept Publ Hlth, FMTS, GMRC, 1 Ave lhopital, F-67000 Strasbourg, France
[2] Univ Toronto, Fac Med, Toronto, ON, Canada
[3] Strasbourg Univ, ICube, UMR7357, CNRS, 2-4 rue Boussingault, F-67000 Strasbourg, France
[4] Strasbourg Univ Hosp, Dept Hand Surg, FMTS, 1 Ave Moliere, F-67200 Strasbourg, France
关键词
Bard; ChatGPT; Chatbot; Hand Surgery; Multiple-choice question; Artificial intelligence; ARTIFICIAL-INTELLIGENCE; BOARD;
D O I
10.1007/s00264-023-06034-y
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
PurposeAccording to a previous research, the chatbot ChatGPT (R) V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard (R) would have superior performance compared to ChatGPT on the EBHS diploma examination.MethodsChatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.ResultsBard (R) scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard (R) responses was homogeneous, but its learning capacity is still developing.ConclusionsThe main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT (R) nor Bard (R), in their current versions, can pass the first part of the EBHS diploma exam.
引用
收藏
页码:151 / 158
页数:8
相关论文
共 12 条
  • [1] Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?
    Goetsch Thibaut
    Armaghan Dabbagh
    Philippe Liverneaux
    International Orthopaedics, 2024, 48 : 151 - 158
  • [2] Performance of "Bard", Google's Artificial Intelligence Chatbot, on Ophthalmology Board Exam Practice Questions
    Botross, Monica
    Mohammadi, Seyed Omid
    Montgomery, Kendall
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [3] Google's AI chatbot "Bard": a side-by-side comparison with ChatGPT and its utilization in ophthalmology
    Waisberg, Ethan
    Ong, Joshua
    Masalkhi, Mouayad
    Zaman, Nasif
    Sarker, Prithul
    Lee, Andrew G.
    Tavakkoli, Alireza
    EYE, 2024, 38 (04) : 642 - 645
  • [4] Google’s AI chatbot “Bard”: a side-by-side comparison with ChatGPT and its utilization in ophthalmology
    Ethan Waisberg
    Joshua Ong
    Mouayad Masalkhi
    Nasif Zaman
    Prithul Sarker
    Andrew G. Lee
    Alireza Tavakkoli
    Eye, 2024, 38 : 642 - 645
  • [5] Performance of Google's Artificial Intelligence Chatbot "Bard" (Now "Gemini") on Ophthalmology Board Exam Practice Questions
    Botross, Monica
    Mohammadi, Seyed Omid
    Montgomery, Kendall
    Crawford, Courtney
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
  • [6] ChatGPT-4 vs. Google Bard: Which Chatbot Better Understands the Italian Legislative Framework for Worker Health and Safety?
    Padovan, Martina
    Palla, Alessandro
    Marino, Riccardo
    Porciatti, Francesco
    Cosci, Bianca
    Carlucci, Francesco
    Nerli, Gianluca
    Petillo, Armando
    Necciari, Gabriele
    Dell'Amico, Letizia
    Lucisano, Vincenzo Carmelo
    Scarinci, Sergio
    Foddis, Rudy
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [7] How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study
    Azizoglu, Mustafa
    Aydogdu, Bahattin
    MEDICINA BALEAR, 2024, 39 (01): : 23 - 26
  • [8] Evaluating the efficacy of major language models in providing guidance for hand trauma nerve laceration patients: a case study on Google's AI BARD, Bing AI, and ChatGPT
    Lim, Bryan
    Seth, Ishith
    Bulloch, Gabriella
    Xie, Yi
    Hunter-Smith, David J.
    Rozen, Warren M.
    PLASTIC AND AESTHETIC RESEARCH, 2023, 10
  • [9] Does Pedicle Screw Fixation Assisted by O-Arm Navigation Perform Better Than Fluoroscopy-guided Technique in Thoracolumbar Fractures in Percutaneous Surgery?: A Retrospective Cohort Study
    Lu, Jianhua
    Chen, Weikai
    Liu, Hao
    Yang, Huilin
    Liu, Tao
    CLINICAL SPINE SURGERY, 2020, 33 (06): : 247 - 253
  • [10] Does robot-assisted spine surgery for multi-level lumbar fusion achieve better patient-reported outcomes than free-hand techniques?
    Lee, Nathan J.
    Boddapati, Venkat
    Mathew, Justin
    Marciano, Gerard
    Fields, Michael
    Buchana, Ian A.
    Zuckerman, Scott L.
    Park, Paul J.
    Leung, Eric
    Lombardi, Joseph M.
    Lehman, Ronald A.
    INTERDISCIPLINARY NEUROSURGERY-ADVANCED TECHNIQUES AND CASE MANAGEMENT, 2021, 25