Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives (Publication with Expression of Concern. See JAN, 2025)

被引：2

作者：

Nasef, Hazem ^{[1
]}

Patel, Heli ^{[1
]}

Amin, Quratulain ^{[1
]}

Baum, Samuel ^{[2
]}

Ratnasekera, Asanthi ^{[3
]}

Ang, Darwin ^{[4
]}

Havron, William S. ^{[5
,6
]}

Nakayama, Don ^{[7
]}

Elkbuli, Adel ^{[5
,6
]}

机构：

[1] NOVA Southeastern Univ, Kiran Patel Coll Allopath Med, Ft Lauderdale, FL USA

[2] Louisiana State Univ, Coll Med, Hlth Sci Ctr, New Orleans, LA USA

[3] Drexel Coll Med, Dept Surg, Newark, DE USA

[4] Ocala Reg Med Ctr, Dept Surg, Ocala, FL USA

[5] Orlando Reg Med Ctr Inc, Dept Surg Educ, Orlando, FL USA

[6] Orlando Reg Med Ctr Inc, Dept Surg, Div Trauma & Surg Crit Care, 86 W Underwood St, Orlando, FL 32806 USA

[7] Mercer Univ, Sch Med, Columbus, GA USA

来源：

AMERICAN SURGEON | 2025年 / 91卷 / 03期

关键词：

ChatGPT; clinical practice; U.S surgeons; common surgical conditions; evidence-based medicine;

D O I：

10.1177/00031348241256075

中图分类号：

R61 [外科手术学];

学科分类号：

摘要：

Background: This study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons. Methods: An anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests. Results: Participating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (+/-.535) vs 2.00 (+/- 1.16), P = .025) (4.14 (+/-.69) vs 2.43 (+/-.98), P < .001) and valid (3.71 (+/-.488) vs 2.86 (+/- 1.07), P = .045) (3.71 (+/-.76) vs 2.71 (+/-.95) P = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, P = .289) (3.57 vs 2.71, P = .111). Conclusion: Surveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.

引用

页码：325 / 335

页数：11

共 1 条

[1] Evaluating the Factors Influencing Residency Match for Surgical Specialty Applicants and Programs: Challenges and Future Directions (Publication with Expression of Concern. See JAN, 2025)
Patel, Heli
Wright, D-Dre
Hernandez, Nickolas
Werling, Alaina
Watts, Emelia
Havron, William S.
Elkbuli, Adel
AMERICAN SURGEON, 2025, 91 (03) : 386 - 392

← 1 →