Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement

被引：7

作者：

Zhang, Siyuan ^{[1
]}

Liau, Zi Qiang Glen ^{[1
]}

Tan, Kian Loong Melvin ^{[1
]}

Chua, Wei Liang ^{[1
]}

机构：

[1] Natl Univ Hlth Syst, Dept Orthopaed Surg, Level 11,NUHS Tower Block,1E Kent Ridge Rd, Singapore 119228, Singapore

来源：

KNEE SURGERY & RELATED RESEARCH | 2024年 / 36卷 / 01期

关键词：

ChatGPT; Artificial intelligence; Chatbot; Large language model; Total knee replacement; Total knee arthroplasty; ARTHROPLASTY;

D O I：

10.1186/s43019-024-00218-5

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

Background Chat Generative Pretrained Transformer (ChatGPT), a generative artificial intelligence chatbot, may have broad applications in healthcare delivery and patient education due to its ability to provide human-like responses to a wide range of patient queries. However, there is limited evidence regarding its ability to provide reliable and useful information on orthopaedic procedures. This study seeks to evaluate the accuracy and relevance of responses provided by ChatGPT to frequently asked questions (FAQs) regarding total knee replacement (TKR).Methods A list of 50 clinically-relevant FAQs regarding TKR was collated. Each question was individually entered as a prompt to ChatGPT (version 3.5), and the first response generated was recorded. Responses were then reviewed by two independent orthopaedic surgeons and graded on a Likert scale for their factual accuracy and relevance. These responses were then classified into accurate versus inaccurate and relevant versus irrelevant responses using preset thresholds on the Likert scale.Results Most responses were accurate, while all responses were relevant. Of the 50 FAQs, 44/50 (88%) of ChatGPT responses were classified as accurate, achieving a mean Likert grade of 4.6/5 for factual accuracy. On the other hand, 50/50 (100%) of responses were classified as relevant, achieving a mean Likert grade of 4.9/5 for relevance.Conclusion ChatGPT performed well in providing accurate and relevant responses to FAQs regarding TKR, demonstrating great potential as a tool for patient education. However, it is not infallible and can occasionally provide inaccurate medical information. Patients and clinicians intending to utilize this technology should be mindful of its limitations and ensure adequate supervision and verification of information provided.

引用

页数：8

共 50 条

[41] The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma
Dogan, Levent
Yilmaz, Ibrahim Edhem
EUROPEAN JOURNAL OF OPHTHALMOLOGY, 2025,
[42] Frequently Asked Questions in Prenatal Testing: Comparing Readability Between AI (ChatGPT) and ACOG
Cadiente, Angelo
Chen, Jamie
Haddad, Andrew
Alvarez, Manuel
Oladipo, Antonia F.
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2024, 230 (01) : S224 - S224
[43] Chatbot ChatGPT-4 and Frequently Asked Questions About Amblyopia and Childhood Myopia
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
JOURNAL OF PEDIATRIC OPHTHALMOLOGY & STRABISMUS, 2024, 61 (02) : 151 - 151
[44] Assessing the accuracy and utility of ChatGPT responses to patient questions regarding posterior lumbar decompression
Giakas, Alec M.
Narayanan, Rajkishen
Ezeonu, Teeto
Dalton, Jonathan
Lee, Yunsoo
Henry, Tyler
Mangan, John
Schroeder, Gregory
Vaccaro, Alexander
Kepler, Christopher
ARTIFICIAL INTELLIGENCE SURGERY, 2024, 4 (03): : 233 - 246
[45] Commentary on: Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation
Najafali, Daniel
Dorafshar, Amir H.
AESTHETIC SURGERY JOURNAL, 2023, 43 (10) : 1136 - 1138
[46] Correspondence on “Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery”
Namria Ishaaq
Shahab Saquib Sohail
Obesity Surgery, 2023, 33 : 4159 - 4159
[47] Correspondence on "Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery"
Ishaaq, Namria
Sohail, Shahab Saquib
OBESITY SURGERY, 2023, 33 (12) : 4159 - 4159
[48] Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery
Cohen, Samuel A.
Brant, Arthur
Fisher, Ann Caroline
Pershing, Suzann
Do, Diana
Pan, Carolyn
SEMINARS IN OPHTHALMOLOGY, 2024, 39 (06) : 472 - 479
[49] QUESTIONS MOST FREQUENTLY ASKED REGARDING SOCIAL-SECURITY DISABILITY PROGRAMS
FELLERS, FH
JOURNAL OF THE SOUTH CAROLINA MEDICAL ASSOCIATION, 1977, 73 (05): : 216 - &
[50] Appropriateness of Frequently Asked Patient Questions Following Total Hip Arthroplasty From ChatGPT Compared to Arthroplasty-Trained Nurses
Dubin, Jeremy A.
Bains, Sandeep S.
Derogatis, Michael J.
Moore, Mallory C.
Hameed, Daniel
Mont, Michael A.
Nace, James
Delanois, Ronald E.
JOURNAL OF ARTHROPLASTY, 2024, 39 (09): : S306 - S311

← 1 2 3 4 5 →