Chat-GPT on brain tumors: An examination of Artificial Intelligence/ Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases

被引:6
|
作者
Kozel, Giovanni [1 ]
Gurses, Muhammet Enes [2 ]
Gecici, Neslihan Nisa [3 ]
Gokalp, Elif [4 ]
Bahadir, Siyar [5 ]
Merenzon, Martin A. [6 ]
Shah, Ashish H. [2 ]
Komotar, Ricardo J. [2 ]
Ivan, Michael E. [2 ]
机构
[1] Brown Univ, Warren Alpert Med Sch, Providence, RI USA
[2] Univ Miami, Miller Sch Med, Dept Neurosurg, 1475 NW 12th Ave, Miami, FL 33136 USA
[3] Hacettepe Univ, Sch Med, Ankara, Turkiye
[4] Ankara Univ, Sch Med, Ankara, Turkiye
[5] Feinstein Inst, New York, NY USA
[6] Yale Univ, Sch Med, Dept Neurosurg, New Haven, CT USA
关键词
Neurosurgery; Artificial intelligence; ChatGPT; 3.5; 4; Neurosurgical Treatment; SURVIVAL;
D O I
10.1016/j.clineuro.2024.108238
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Objective: Assess the capabilities of ChatGPT-3.5 and 4 to provide accurate diagnoses, treatment options, and treatment plans for brain tumors in example neuro-oncology cases. Methods: ChatGPT-3.5 and 4 were provided with twenty example neuro-oncology cases of brain tumors, all selected from medical textbooks. The artificial intelligence programs were asked to give a diagnosis, treatment option, and treatment plan for each of these twenty example cases. Team members first determined in which cases ChatGPT-3.5 and 4 provided the correct diagnosis or treatment plan. Twenty neurosurgeons from the researchers' institution then independently rated the diagnoses, treatment options, and treatment plans provided by both artificial intelligence programs for each of the twenty example cases, on a scale of one to ten, with ten being the highest score. To determine whether the difference between the scores of ChatGPT-3.5 and 4 was statistically significant, a paired t-test was conducted for the average scores given to the programs for each example case. Results: In the initial analysis of correct responses, ChatGPT-4 had an accuracy of 85% for its diagnoses of example brain tumors and an accuracy of 75% for its provided treatment plans, while ChatGPT-3.5 only had an accuracy of 65% and 10%, respectively. The average scores given by the twenty independent neurosurgeons to ChatGPT-4 for its accuracy of diagnosis, provided treatment options, and provided treatment plan were 8.3, 8.4, and 8.5 out of 10, respectively, while ChatGPT-3.5's average scores for these categories of assessment were 5.9, 5.7, and 5.7. These differences in average score are statistically significant on a paired t-test, with a p-value of less than 0.001 for each difference. Conclusions: ChatGPT-4 demonstrates great promise as a diagnostic tool for brain tumors in neuro-oncology, as attested to by the program's performance in this study and its assessment by surveyed neurosurgeon reviewers.
引用
收藏
页数:5
相关论文
共 1 条