Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam?

被引：9

作者：

Thibaut, Goetsch ^{[1
]}

Dabbagh, Armaghan ^{[2
]}

Liverneaux, Philippe ^{[3
,4
]}

机构：

[1] Strasbourg Univ Hosp, Dept Publ Hlth, FMTS, GMRC, 1 Ave lhopital, F-67000 Strasbourg, France

[2] Univ Toronto, Fac Med, Toronto, ON, Canada

[3] Strasbourg Univ, ICube, UMR7357, CNRS, 2-4 rue Boussingault, F-67000 Strasbourg, France

[4] Strasbourg Univ Hosp, Dept Hand Surg, FMTS, 1 Ave Moliere, F-67200 Strasbourg, France

来源：

INTERNATIONAL ORTHOPAEDICS | 2023年 / 48卷 / 1期

关键词：

Bard; ChatGPT; Chatbot; Hand Surgery; Multiple-choice question; Artificial intelligence; ARTIFICIAL-INTELLIGENCE; BOARD;

D O I：

10.1007/s00264-023-06034-y

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

PurposeAccording to a previous research, the chatbot ChatGPT (R) V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard (R) would have superior performance compared to ChatGPT on the EBHS diploma examination.MethodsChatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.ResultsBard (R) scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard (R) responses was homogeneous, but its learning capacity is still developing.ConclusionsThe main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT (R) nor Bard (R), in their current versions, can pass the first part of the EBHS diploma exam.

引用

页码：151 / 158

页数：8

共 12 条

[1] Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?
Goetsch Thibaut
Armaghan Dabbagh
Philippe Liverneaux
International Orthopaedics, 2024, 48 : 151 - 158
[2] Performance of "Bard", Google's Artificial Intelligence Chatbot, on Ophthalmology Board Exam Practice Questions
Botross, Monica
Mohammadi, Seyed Omid
Montgomery, Kendall
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
[3] Google's AI chatbot "Bard": a side-by-side comparison with ChatGPT and its utilization in ophthalmology
Waisberg, Ethan
Ong, Joshua
Masalkhi, Mouayad
Zaman, Nasif
Sarker, Prithul
Lee, Andrew G.
Tavakkoli, Alireza
EYE, 2024, 38 (04) : 642 - 645
[4] Google’s AI chatbot “Bard”: a side-by-side comparison with ChatGPT and its utilization in ophthalmology
Ethan Waisberg
Joshua Ong
Mouayad Masalkhi
Nasif Zaman
Prithul Sarker
Andrew G. Lee
Alireza Tavakkoli
Eye, 2024, 38 : 642 - 645
[5] Performance of Google's Artificial Intelligence Chatbot "Bard" (Now "Gemini") on Ophthalmology Board Exam Practice Questions
Botross, Monica
Mohammadi, Seyed Omid
Montgomery, Kendall
Crawford, Courtney
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
[6] ChatGPT-4 vs. Google Bard: Which Chatbot Better Understands the Italian Legislative Framework for Worker Health and Safety?
Padovan, Martina
Palla, Alessandro
Marino, Riccardo
Porciatti, Francesco
Cosci, Bianca
Carlucci, Francesco
Nerli, Gianluca
Petillo, Armando
Necciari, Gabriele
Dell'Amico, Letizia
Lucisano, Vincenzo Carmelo
Scarinci, Sergio
Foddis, Rudy
APPLIED SCIENCES-BASEL, 2025, 15 (03):
[7] How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study
Azizoglu, Mustafa
Aydogdu, Bahattin
MEDICINA BALEAR, 2024, 39 (01): : 23 - 26
[8] Evaluating the efficacy of major language models in providing guidance for hand trauma nerve laceration patients: a case study on Google's AI BARD, Bing AI, and ChatGPT
Lim, Bryan
Seth, Ishith
Bulloch, Gabriella
Xie, Yi
Hunter-Smith, David J.
Rozen, Warren M.
PLASTIC AND AESTHETIC RESEARCH, 2023, 10
[9] Does Pedicle Screw Fixation Assisted by O-Arm Navigation Perform Better Than Fluoroscopy-guided Technique in Thoracolumbar Fractures in Percutaneous Surgery?: A Retrospective Cohort Study
Lu, Jianhua
Chen, Weikai
Liu, Hao
Yang, Huilin
Liu, Tao
CLINICAL SPINE SURGERY, 2020, 33 (06): : 247 - 253
[10] Does robot-assisted spine surgery for multi-level lumbar fusion achieve better patient-reported outcomes than free-hand techniques?
Lee, Nathan J.
Boddapati, Venkat
Mathew, Justin
Marciano, Gerard
Fields, Michael
Buchana, Ian A.
Zuckerman, Scott L.
Park, Paul J.
Leung, Eric
Lombardi, Joseph M.
Lehman, Ronald A.
INTERDISCIPLINARY NEUROSURGERY-ADVANCED TECHNIQUES AND CASE MANAGEMENT, 2021, 25

← 1 2 →