Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

被引：19

作者：

Onder, C. E. ^{[1
]}

Koc, G. ^{[1
]}

Gokbulut, P. ^{[1
]}

Taskaldiran, I. ^{[1
]}

Kuskonmaz, S. M. ^{[1
]}

机构：

[1] Ankara Numune Training & Res Hosp, Dept Endocrinol & Metab Dis, Ankara, Turkiye

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

HEALTH INFORMATION;

D O I：

10.1038/s41598-023-50884-w

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 +/- 3.14; the median GQS score was 4 (2-4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00-37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.

引用

页数：8

共 50 条

[41] Evaluation of the quality and reliability of ChatGPT and Perplexity′s responses about rectal cancer
Yazici, O.
Yucel, K. Bir
Sutcuoglu, O.
ANNALS OF ONCOLOGY, 2023, 34 : S1090 - S1090
[42] Evaluation of ChatGPT-4's Performance in Therapeutic Decision-Making During Multidisciplinary Oncology Meetings for Head and Neck Squamous Cell Carcinoma
Alami, Kenza
Willemse, Esther
Quiriny, Marie
Lipski, Samuel
Laurent, Celine
Donquier, Vincent
Digonnet, Antoine
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (09)
[43] Exploring Multimodal Large Language Models ChatGPT-4 and Bard for Visual Complexity Evaluation of Mobile User Interfaces
Akca, Eren
Tanriover, Omer Ozgur
TRAITEMENT DU SIGNAL, 2024, 41 (05) : 2673 - 2681
[44] Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity and Bard
Wiwanitkit, S.
Wiwanitkit, V.
CLINICAL RADIOLOGY, 2024, 79 (04)
[45] Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia
Nikdel, Mojgan
Ghadimi, Hadi
Tavakoli, Mehdi
Suh, Donny W.
JOURNAL OF PEDIATRIC OPHTHALMOLOGY & STRABISMUS, 2024, 61 (02) : 86 - 89
[46] The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
Sallam, Malik
Al-Mahzoum, Kholoud
Almutawaa, Rawan Ahmad
Alhashash, Jasmen Ahmad
Dashti, Retaj Abdullah
Alsafy, Danah Raed
Almutairi, Reem Abdullah
Barakat, Muna
BMC RESEARCH NOTES, 2024, 17 (01)
[47] Utilizing artificial intelligence in academic writing: an in-depth evaluation of a scientific review on fertility preservation written by ChatGPT-4
Safrai, Myriam
Orwig, Kyle E.
JOURNAL OF ASSISTED REPRODUCTION AND GENETICS, 2024, 41 (07) : 1871 - 1880
[48] Re: Re: Large language models (LLMs) in evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard
Amato, Infante
CLINICAL RADIOLOGY, 2024, 79 (07)
[49] Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus
Balci, Ali Safa
Cakmak, Semih
OPHTHALMIC EPIDEMIOLOGY, 2025,
[50] Reliability and Readability evaluation of chatbots responses as a patient information resource for the most common PET/CT scans
Aydinbelge-Dizdar, N.
Dizdar, K.
REVISTA ESPANOLA DE MEDICINA NUCLEAR E IMAGEN MOLECULAR, 2025, 44 (01):

← 1 2 3 4 5 →