Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care

被引：4

作者：

Hanci, Volkan ^{[1
]}

Ergun, Bisar ^{[2
]}

Gul, Sanser ^{[3
]}

Uzun, Ozcan ^{[4
]}

Erdemir, Ismail ^{[5
]}

Hanci, Ferid Baran ^{[6
]}

机构：

[1] Sincan Educ & Res Hosp, Clin Anesthesiol & Crit Care, TR-06930 Ankara, Turkiye

[2] Dr Ismail Fehmi Cumalioglu City Hosp, Clin Internal Med & Crit Care, Tekirdag, Turkiye

[3] Ankara Ataturk Sanatory Educ & Res Hosp, Clin Neurosurg, Ankara, Turkiye

[4] Yalova City Hosp, Clin Internal Med & Nephrol, Yalova, Turkiye

[5] Dokuz Eylul Univ, Fac Med, Dept Anesthesiol & Crit Care, Izmir, Turkiye

[6] Ostim Tech Univ, Fac Engn, Artificial Intelligence Engn Dept, Ankara, Turkiye

来源：

MEDICINE | 2024年 / 103卷 / 33期

关键词：

artificial intelligence; Bard (R); ChatGPT (R); Copilot (R); Gemini (R); online medical information; palliative care; Perplexity (R); readability; HEALTH LITERACY; EDUCATION; INFORMATION;

D O I：

10.1097/MD.0000000000039305

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

There is no study that comprehensively evaluates data on the readability and quality of "palliative care" information provided by artificial intelligence (AI) chatbots ChatGPT (R), Bard (R), Gemini (R), Copilot (R), Perplexity (R). Our study is an observational and cross-sectional original research study. In our study, AI chatbots ChatGPT (R), Bard (R), Gemini (R), Copilot (R), and Perplexity (R) were asked to present the answers of the 100 questions most frequently asked by patients about palliative care. Responses from each 5 AI chatbots were analyzed separately. This study did not involve any human participants. Study results revealed significant differences between the readability assessments of responses from all 5 AI chatbots (P < .05). According to the results of our study, when different readability indexes were evaluated holistically, the readability of AI chatbot responses was evaluated as Bard (R), Copilot (R), Perplexity (R), ChatGPT (R), Gemini (R), from easy to difficult (P < .05). In our study, the median readability indexes of each of the 5 AI chatbots Bard (R), Copilot (R), Perplexity (R), ChatGPT (R), Gemini (R) responses were compared to the "recommended" 6th grade reading level. According to the results of our study answers of all 5 AI chatbots were compared with the 6th grade reading level, statistically significant differences were observed in the all formulas (P < .001). The answers of all 5 artificial intelligence robots were determined to be at an educational level well above the 6th grade level. The modified DISCERN and Journal of American Medical Association scores was found to be the highest in Perplexity (R) (P < .001). Gemini (R) responses were found to have the highest Global Quality Scale score (P < .001). It is emphasized that patient education materials should have a readability level of 6th grade level. Of the 5 AI chatbots whose answers about palliative care were evaluated, Bard (R), Copilot (R), Perplexity (R), ChatGPT (R), Gemini (R), their current answers were found to be well above the recommended levels in terms of readability of text content. Text content quality assessment scores are also low. Both the quality and readability of texts should be brought to appropriate recommended limits.

引用

页数：9

共 50 条

[11] Performance assessment of ChatGPT 4, ChatGPT 3.5, Gemini Advanced Pro 1.5 and Bard 2.0 to problem solving in pathology in French language
Tarris, Georges
Martin, Laurent
DIGITAL HEALTH, 2025, 11
[12] Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
Onder, C. E.
Koc, G.
Gokbulut, P.
Taskaldiran, I.
Kuskonmaz, S. M.
SCIENTIFIC REPORTS, 2024, 14 (01)
[13] Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
C. E. Onder
G. Koc
P. Gokbulut
I. Taskaldiran
S. M. Kuskonmaz
Scientific Reports, 14
[14] Assessing the Quality and Readability of Chatgpt Responses to Commonly Asked Questions in Plastic Surgery
Keating, Muireann
Bollard, Stephanie
Potter, Shirley
IRISH JOURNAL OF MEDICAL SCIENCE, 2024, 193 : S54 - S54
[15] Evaluating reliability, quality, and readability of ChatGPT's nutritional recommendations for women with polycystic ovary syndrome
Ulug, Elif
Gunesli, Irmak
Pinar, Aylin Acikgoz
Yildiz, Bulent Okan
NUTRITION RESEARCH, 2025, 133 : 46 - 53
[16] Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard
Lee, D.
Brown, M.
Hammond, J.
Zakowski, M.
INTERNATIONAL JOURNAL OF OBSTETRIC ANESTHESIA, 2025, 61
[17] A Comparison of Prostate Cancer Screening Information Quality on Standard and Advanced Versions of ChatGPT, Google Gemini, and Microsoft Copilot: A Cross-Sectional Study
Owens, Otis L.
Leonard, Michael
AMERICAN JOURNAL OF HEALTH PROMOTION, 2025,
[18] Quality of life assessment in palliative care
Catania, G.
Beccaro, M.
Costantini, M.
Ugolini, D.
De Silvestri, A.
Bagnasco, A.
Sasso, L.
EUROPEAN JOURNAL OF ONCOLOGY NURSING, 2014, 18 : S7 - S7
[19] Assessment of Quality and Readability of Information Provided by ChatGPT in Relation to Anterior Cruciate Ligament Injury
Fahy, Stephen
Oehme, Stephan
Milinkovic, Danko
Jung, Tobias
Bartek, Benjamin
JOURNAL OF PERSONALIZED MEDICINE, 2024, 14 (01):
[20] Reliability and Quality of the Nursing Care Planning Texts Generated by ChatGPT
Dagci, Mahmut
Cam, Funda
Dost, Ayse
NURSE EDUCATOR, 2024, 49 (03) : E109 - E114

← 1 2 3 4 5 →