Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

被引:19
|
作者
Onder, C. E. [1 ]
Koc, G. [1 ]
Gokbulut, P. [1 ]
Taskaldiran, I. [1 ]
Kuskonmaz, S. M. [1 ]
机构
[1] Ankara Numune Training & Res Hosp, Dept Endocrinol & Metab Dis, Ankara, Turkiye
关键词
HEALTH INFORMATION;
D O I
10.1038/s41598-023-50884-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 +/- 3.14; the median GQS score was 4 (2-4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00-37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Evaluation of the quality and reliability of ChatGPT and Perplexity′s responses about rectal cancer
    Yazici, O.
    Yucel, K. Bir
    Sutcuoglu, O.
    ANNALS OF ONCOLOGY, 2023, 34 : S1090 - S1090
  • [42] Evaluation of ChatGPT-4's Performance in Therapeutic Decision-Making During Multidisciplinary Oncology Meetings for Head and Neck Squamous Cell Carcinoma
    Alami, Kenza
    Willemse, Esther
    Quiriny, Marie
    Lipski, Samuel
    Laurent, Celine
    Donquier, Vincent
    Digonnet, Antoine
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (09)
  • [43] Exploring Multimodal Large Language Models ChatGPT-4 and Bard for Visual Complexity Evaluation of Mobile User Interfaces
    Akca, Eren
    Tanriover, Omer Ozgur
    TRAITEMENT DU SIGNAL, 2024, 41 (05) : 2673 - 2681
  • [44] Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity and Bard
    Wiwanitkit, S.
    Wiwanitkit, V.
    CLINICAL RADIOLOGY, 2024, 79 (04)
  • [45] Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia
    Nikdel, Mojgan
    Ghadimi, Hadi
    Tavakoli, Mehdi
    Suh, Donny W.
    JOURNAL OF PEDIATRIC OPHTHALMOLOGY & STRABISMUS, 2024, 61 (02) : 86 - 89
  • [46] The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
    Sallam, Malik
    Al-Mahzoum, Kholoud
    Almutawaa, Rawan Ahmad
    Alhashash, Jasmen Ahmad
    Dashti, Retaj Abdullah
    Alsafy, Danah Raed
    Almutairi, Reem Abdullah
    Barakat, Muna
    BMC RESEARCH NOTES, 2024, 17 (01)
  • [47] Utilizing artificial intelligence in academic writing: an in-depth evaluation of a scientific review on fertility preservation written by ChatGPT-4
    Safrai, Myriam
    Orwig, Kyle E.
    JOURNAL OF ASSISTED REPRODUCTION AND GENETICS, 2024, 41 (07) : 1871 - 1880
  • [48] Re: Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard
    Amato, Infante
    CLINICAL RADIOLOGY, 2024, 79 (07)
  • [49] Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus
    Balci, Ali Safa
    Cakmak, Semih
    OPHTHALMIC EPIDEMIOLOGY, 2025,
  • [50] Reliability and Readability evaluation of chatbots responses as a patient information resource for the most common PET/CT scans
    Aydinbelge-Dizdar, N.
    Dizdar, K.
    REVISTA ESPANOLA DE MEDICINA NUCLEAR E IMAGEN MOLECULAR, 2025, 44 (01):