Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis

被引：9

作者：

Yang, Jaewon ^{[1
,5
]}

Ardavanis, Kyle S. ^{[2
]}

Slack, Katherine E. ^{[3
]}

Fernando, Navin D. ^{[1
]}

Della Valle, Craig J. ^{[4
]}

Hernandez, Nicholas M. ^{[1
]}

机构：

[1] Univ Washington, Dept Orthopaed Surg, Seattle, WA USA

[2] Madigan Army Med Ctr, Dept Orthopaed Surg, Tacoma, WA 98431 USA

[3] Washington State Univ, Elson S Floyd Coll Med, Spokane, WA USA

[4] Rush Univ, Dept Orthopaed Surg, Med Ctr, Chicago, IL USA

[5] Univ Washington, Dept Orthopaed & Sports Med, Seattle, WA 98104 USA

来源：

JOURNAL OF ARTHROPLASTY | 2024年 / 39卷 / 05期

关键词：

ChatGPT; bard; machine learning; artificial intelligence; large language models; LEARNING ALGORITHM; COMPLICATIONS; ARTHROPLASTY;

D O I：

10.1016/j.arth.2024.01.029

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

Background: Advancements in artificial intelligence (AI) have led to the creation of large language models (LLMs), such as Chat Generative Pretrained Transformer (ChatGPT) and Bard, that analyze online resources to synthesize responses to user queries. Despite their popularity, the accuracy of LLM responses to medical questions remains unknown. This study aimed to compare the responses of ChatGPT and Bard regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) Evidence-Based Clinical Practice Guidelines (CPGs) recommendations. Methods: Both ChatGPT (Open AI) and Bard (Google) were queried regarding 20 treatments (10 for hip and 10 for knee osteoarthritis) from the AAOS CPGs. Responses were classified by 2 reviewers as being in "Concordance, " "Discordance, " or "No Concordance " with AAOS CPGs. A Cohen 's Kappa coefficient was used to assess inter -rater reliability, and Chi -squared analyses were used to compare responses between LLMs. Results: Overall, ChatGPT and Bard provided responses that were concordant with the AAOS CPGs for 16 (80%) and 12 (60%) treatments, respectively. Notably, ChatGPT and Bard encouraged the use of nonrecommended treatments in 30% and 60% of queries, respectively. There were no differences in performance when evaluating by joint or by recommended versus non-recommended treatments. Studies were referenced in 6 (30%) of the Bard responses and none (0%) of the ChatGPT responses. Of the 6 Bard responses, studies could only be identified for 1 (16.7%). Of the remaining, 2 (33.3%) responses cited studies in journals that did not exist, 2 (33.3%) cited studies that could not be found with the information given, and 1 (16.7%) provided links to unrelated studies. Conclusions: Both ChatGPT and Bard do not consistently provide responses that align with the AAOS CPGs. Consequently, physicians and patients should temper expectations on the guidance AI platforms can currently provide. (c) 2024 Elsevier Inc. All rights reserved.

引用

页码：1184 / 1190

页数：7

共 6 条

[1] Utilizing Artificial Intelligence and Chat Generative Pretrained Transformer to Answer Questions About Clinical Scenarios in Neuroanesthesiology
Blacker, Samuel N.
Kang, Mia
Chakraborty, Indranil
Chowdhury, Tumul
Williams, James
Lewis, Carol
Zimmer, Michael
Wilson, Brad
Lele, Abhijit V.
[J]. JOURNAL OF NEUROSURGICAL ANESTHESIOLOGY, 2024, 36 (04) : 346 - 351
[2] Artificial intelligence tools in medical education beyond Chat Generative Pre-trained Transformer (ChatGPT)
Tan, Li Feng
Ng, Isaac K. S.
Teo, Desmond
[J]. POSTGRADUATE MEDICAL JOURNAL, 2024, 100 (1187) : 697 - 698
[3] Foot and Ankle Surgery declares use of generative artificial intelligence like Chat Generative Pre-trained Transformer (ChatGPT) for scientific publications
Richter, Martinus
[J]. FOOT AND ANKLE SURGERY, 2023, 29 (05) : 385 - 386
[4] Blepharoptosis Consultation with Artificial Intelligence: Aesthetic Surgery Advice and Counseling from Chat Generative Pre-Trained Transformer (ChatGPT)
Shiraishi, Makoto
Tanigawa, Koji
Tomioka, Yoko
Miyakuni, Ami
Moriwaki, Yuta
Yang, Rui
Oba, Jun
Okazaki, Mutsumi
[J]. AESTHETIC PLASTIC SURGERY, 2024, 48 (11) : 2057 - 2063
[5] Letter to the Editor: Educating Patients With Advanced Heart Failure Through Chat Generative Pretrained Transformer and Natural-Language Artificial Intelligence: Is Now the Time for It?
Koh, Samuel Ji Quan
Sim, David Kheng Leng
Neo, Shirlyn Hui-Shan
[J]. JOURNAL OF PALLIATIVE MEDICINE, 2023, 26 (07) : 893 - 895
[6] Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?
Cuthbert, Rory
Simpson, Ashley, I
[J]. POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1176) : 1110 - 1114

← 1 →