Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard Against Traditional Information Resources

被引：1

作者：

Connors, Christopher ^{[1
]}

Gupta, Kavita ^{[1
]}

Khusid, Johnathan A. ^{[1
]}

Khargi, Raymond ^{[1
]}

Yaghoubian, Alan J. ^{[2
]}

Levy, Micah ^{[1
]}

Gallante, Blair ^{[1
]}

Atallah, William ^{[1
]}

Gupta, Mantu ^{[1
]}

机构：

[1] Icahn Sch Med Mt Sinai, Dept Urol, 1 Gustave L Levy Pl, New York, NY 10029 USA

[2] Univ Calif Los Angeles, David Geffen Sch Med, Dept Urol, Los Angeles, CA USA

来源：

JOURNAL OF ENDOUROLOGY | 2024年

关键词：

artificial intelligence; endourology; patient education; ChatGPT; Bard; Urology Care Foundation;

D O I：

10.1089/end.2023.0696

中图分类号：

R5 [内科学]; R69 [泌尿科学（泌尿生殖系疾病）];

学科分类号：

1002 ; 100201 ;

摘要：

Introduction: Artificial intelligence (AI) platforms such as ChatGPT and Bard are increasingly utilized to answer patient health care questions. We present the first study to blindly evaluate AI-generated responses to common endourology patient questions against official patient education materials. Methods: Thirty-two questions and answers spanning kidney stones, ureteral stents, benign prostatic hyperplasia (BPH), and upper tract urothelial carcinoma were extracted from official Urology Care Foundation (UCF) patient education documents. The same questions were input into ChatGPT 4.0 and Bard, limiting responses to within +/- 10% of the word count of the corresponding UCF response to ensure fair comparison. Six endourologists blindly evaluated responses from each platform using Likert scales for accuracy, clarity, comprehensiveness, and patient utility. Reviewers identified which response they believed was not AI generated. Finally, Flesch-Kincaid Reading Grade Level formulas assessed the readability of each platform response. Ratings were compared using analysis of variance (ANOVA) and chi-square tests. Results: ChatGPT responses were rated the highest across all categories, including accuracy, comprehensiveness, clarity, and patient utility, while UCF answers were consistently scored the lowest, all p < 0.01. A subanalysis revealed that this trend was consistent across question categories (i.e., kidney stones, BPH, etc.). However, AI-generated responses were more likely to be classified at an advanced reading level, while UCF responses showed improved readability (college or higher reading level: ChatGPT = 100%, Bard = 66%, and UCF = 19%), p < 0.001. When asked to identify which answer was not AI generated, 54.2% of responses indicated ChatGPT, 26.6% indicated Bard, and only 19.3% correctly identified it as the UCF response. Conclusions: In a blind evaluation, AI-generated responses from ChatGPT and Bard surpassed the quality of official patient education materials in endourology, suggesting that current AI platforms are already a reliable resource for basic urologic care information. AI-generated responses do, however, tend to require a higher reading level, which may limit their applicability to a broader audience.

引用

页数：9

共 12 条

[1] An evaluation of orthodontic information quality regarding artificial intelligence (AI) chatbot technologies: A comparison of ChatGPT and google BARD
Arslan, Can
Kahya, Kaan
Cesur, Emre
Cakan, Derya Germec
AUSTRALASIAN ORTHODONTIC JOURNAL, 2024, 40 (01): : 149 - 157
[2] The Significance of Artificial Intelligence Platforms in Anatomy Education: An Experience With ChatGPT and Google Bard
Ilgaz, Hasan B.
Celik, Zehra
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (09)
[3] Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard
Cheong, Ryan Chin Taw
Unadkat, Samit
Mcneillis, Venkata
Williamson, Andrew
Joseph, Jonathan
Randhawa, Premjit
Andrews, Peter
Paleri, Vinidh
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (02) : 985 - 993
[4] Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard
Ryan Chin Taw Cheong
Samit Unadkat
Venkata Mcneillis
Andrew Williamson
Jonathan Joseph
Premjit Randhawa
Peter Andrews
Vinidh Paleri
European Archives of Oto-Rhino-Laryngology, 2024, 281 : 985 - 993
[5] Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses
Garg, Neha
Campbell, Daniel J.
Yang, Angela
Mccann, Adam
Moroco, Annie E.
Estephan, Leonard E.
Palmer, William J.
Krein, Howard
Heffelfinger, Ryan
FACIAL PLASTIC SURGERY & AESTHETIC MEDICINE, 2024,
[6] THE ABILITY OF ARTIFICIAL INTELLIGENCE CHATBOTS ChatGPT AND GOOGLE BARD TO ACCURATELY CONVEY PREOPERATIVE INFORMATION FOR PATIENTS UNDERGOING OPHTHALMIC SURGERIES
Patil, Nikhil S.
Huang, Ryan
Mihalache, Andrew
Kisilevsky, Eli
Kwok, Jason
Popovic, Marko M.
Nassrallah, Georges
Chan, Clara
Mallipatna, Ashwin
Kertes, Peter J.
Muni, Rajeev H.
RETINA-THE JOURNAL OF RETINAL AND VITREOUS DISEASES, 2024, 44 (06): : 950 - 953
[7] Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence
Al-Sharif, Eman M.
Penteado, Rafaella C.
El Jalbout, Nahia Dib
Topilow, Nicole J.
Shoji, Marissa K.
Kikkawa, Don O.
Liu, Catherine Y.
Korn, Bobby S.
OPHTHALMIC PLASTIC AND RECONSTRUCTIVE SURGERY, 2024, 40 (03): : 303 - 311
[8] Clinical questions on advanced life support answered by artificial intelligence. A comparison between ChatGPT, Google Bard and Microsoft Copilot
Semeraro, Federico
Gamberini, Lorenzo
Carmona, Francesc
Monsieurs, Koenraad G.
RESUSCITATION, 2024, 195
[9] Artificial Intelligence for Pre-Colonoscopy Patient Guidance: An Evaluation of ChatGPT's Accuracy Against Clinical Guidelines
Patel, Akash
Ajumobi, Adewale
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2023, 118 (10): : S550 - S550
[10] Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets
Gondode, Prakash
Duggal, Sakshi
Garg, Neha
Sethupathy, Surrender
Asai, Omshubham
Lohakare, Pooja
INDIAN JOURNAL OF ANAESTHESIA, 2024, 68 (07) : 631 - 636

← 1 2 →