Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care

被引:4
|
作者
Hanci, Volkan [1 ]
Ergun, Bisar [2 ]
Gul, Sanser [3 ]
Uzun, Ozcan [4 ]
Erdemir, Ismail [5 ]
Hanci, Ferid Baran [6 ]
机构
[1] Sincan Educ & Res Hosp, Clin Anesthesiol & Crit Care, TR-06930 Ankara, Turkiye
[2] Dr Ismail Fehmi Cumalioglu City Hosp, Clin Internal Med & Crit Care, Tekirdag, Turkiye
[3] Ankara Ataturk Sanatory Educ & Res Hosp, Clin Neurosurg, Ankara, Turkiye
[4] Yalova City Hosp, Clin Internal Med & Nephrol, Yalova, Turkiye
[5] Dokuz Eylul Univ, Fac Med, Dept Anesthesiol & Crit Care, Izmir, Turkiye
[6] Ostim Tech Univ, Fac Engn, Artificial Intelligence Engn Dept, Ankara, Turkiye
关键词
artificial intelligence; Bard (R); ChatGPT (R); Copilot (R); Gemini (R); online medical information; palliative care; Perplexity (R); readability; HEALTH LITERACY; EDUCATION; INFORMATION;
D O I
10.1097/MD.0000000000039305
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
There is no study that comprehensively evaluates data on the readability and quality of "palliative care" information provided by artificial intelligence (AI) chatbots ChatGPT (R), Bard (R), Gemini (R), Copilot (R), Perplexity (R). Our study is an observational and cross-sectional original research study. In our study, AI chatbots ChatGPT (R), Bard (R), Gemini (R), Copilot (R), and Perplexity (R) were asked to present the answers of the 100 questions most frequently asked by patients about palliative care. Responses from each 5 AI chatbots were analyzed separately. This study did not involve any human participants. Study results revealed significant differences between the readability assessments of responses from all 5 AI chatbots (P < .05). According to the results of our study, when different readability indexes were evaluated holistically, the readability of AI chatbot responses was evaluated as Bard (R), Copilot (R), Perplexity (R), ChatGPT (R), Gemini (R), from easy to difficult (P < .05). In our study, the median readability indexes of each of the 5 AI chatbots Bard (R), Copilot (R), Perplexity (R), ChatGPT (R), Gemini (R) responses were compared to the "recommended" 6th grade reading level. According to the results of our study answers of all 5 AI chatbots were compared with the 6th grade reading level, statistically significant differences were observed in the all formulas (P < .001). The answers of all 5 artificial intelligence robots were determined to be at an educational level well above the 6th grade level. The modified DISCERN and Journal of American Medical Association scores was found to be the highest in Perplexity (R) (P < .001). Gemini (R) responses were found to have the highest Global Quality Scale score (P < .001). It is emphasized that patient education materials should have a readability level of 6th grade level. Of the 5 AI chatbots whose answers about palliative care were evaluated, Bard (R), Copilot (R), Perplexity (R), ChatGPT (R), Gemini (R), their current answers were found to be well above the recommended levels in terms of readability of text content. Text content quality assessment scores are also low. Both the quality and readability of texts should be brought to appropriate recommended limits.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain
    Ozduran, Erkan
    Akkoc, Ibrahim
    Buyukcoban, Sibel
    Erkin, Yueksel
    Hanci, Volkan
    MEDICINE, 2025, 104 (11)
  • [2] Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery
    Aydin, Fahri Onur
    Aksoy, Burakhan Kursat
    Ceylan, Ali
    Akbas, Yusuf Berk
    Ermis, Serhat
    Yildiz, Burcin Kepez
    Yildirim, Yusuf
    TURK OFTALMOLOJI DERGISI-TURKISH JOURNAL OF OPHTHALMOLOGY, 2024, 54 (06): : 313 - 317
  • [3] Comparison of ChatGPT-4, Copilot, Bard and Gemini Ultra on an Otolaryngology Question Bank
    Ramchandani, Rashi
    Guo, Eddie
    Mostowy, Michael
    Kreutz, Jason
    Sahlollbey, Nick
    Carr, Michele M.
    Chung, Janet
    Caulley, Lisa
    CLINICAL OTOLARYNGOLOGY, 2025,
  • [4] Evaluation of the quality and reliability of ChatGPT and Perplexity′s responses about rectal cancer
    Yazici, O.
    Yucel, K. Bir
    Sutcuoglu, O.
    ANNALS OF ONCOLOGY, 2023, 34 : S1090 - S1090
  • [5] Evaluation of the reliability, usefulness, quality and readability of ChatGPT’s responses on Scoliosis
    Ayşe Merve Çıracıoğlu
    Suheyla Dal Erdoğan
    European Journal of Orthopaedic Surgery & Traumatology, 35 (1)
  • [6] Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis
    Mavrych, Volodymyr
    Ganguly, Paul
    Bolgova, Olena
    CLINICAL ANATOMY, 2025, 38 (02) : 200 - 210
  • [7] A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity
    Reyhan, Ali Hakim
    Mutaf, Cagri
    Uzun, Irfan
    Yuksekyayla, Funda
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (21)
  • [8] Quality of information about urologic pathology in English and Spanish from ChatGPT, BARD, and Copilot
    Szczesniewski, J. J.
    Alba, A. Ramoso
    Castro, P. M. Rodriguez
    Gomez, M. F. Lorenzo
    Gonzalez, J. Sainz
    Gonzalez, L. Llanes
    ACTAS UROLOGICAS ESPANOLAS, 2024, 48 (05): : 398 - 403
  • [9] Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy
    Tepe, Murat
    Emekli, Emre
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (05)
  • [10] Evaluating the accuracy and reliability of AI chatbots in patient education on cardiovascular imaging: a comparative study of ChatGPT, gemini, and copilot
    Ahmed Marey
    Abdelrahman M. Saad
    Yousef Tanas
    Hossam Ghorab
    Julia Niemierko
    Hazif Backer
    Muhammad Umair
    Egyptian Journal of Radiology and Nuclear Medicine, 56 (1):