Comparing Vision-Capable Models, GPT-4 and Gemini, With GPT-3.5 on Taiwan's Pulmonologist Exam

被引:1
|
作者
Chen, Chih-Hsiung [1 ]
Hsieh, Kuang-Yu [1 ]
Huang, Kuo-En [1 ]
Lai, Hsien-Yun [2 ]
机构
[1] Mennonite Christian Hosp, Dept Crit Care Med, Hualien, Taiwan
[2] Mennonite Christian Hosp, Dept Educ & Res, Hualien, Taiwan
关键词
vision feature; pulmonologist exam; gemini; gpt; large language models; artificial intelligence;
D O I
10.7759/cureus.67641
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction The latest generation of large language models (LLMs) features multimodal capabilities, allowing them to interpret graphics, images, and videos, which are crucial in medical fields. This study investigates the vision capabilities of the next-generation Generative Pre-trained Transformer 4 (GPT-4) and Google's Gemini. Methods To establish a comparative baseline, we used GPT-3.5, a model limited to text processing, and evaluated the performance of both GPT-4 and Gemini on questions from the Taiwan Specialist Board Exams in Pulmonary and Critical Care Medicine. Our dataset included 1,100 questions from 2012 to 2023, with 100 questions per year. Of these, 1,059 were in pure text and 41 were text with images, with the majority in a non-English language and only six in pure English. Results For each annual exam consisting of 100 questions from 2013 to 2023, GPT-4 achieved scores of 66, 69, 51, 64, 72, 64, 66, 64, 63, 68, and 67, respectively. Gemini scored 45, 48, 45, 45, 46, 59, 54, 41, 53, 45, and 45, while GPT-3.5 scored 39, 33, 35, 36, 32, 33, 43, 28, 32, 33, and 36. Conclusions These results demonstrate that the newer LLMs with vision capabilities significantly outperform the text- only model. When a passing score of 60 was set, GPT-4 passed most exams and approached human performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course
    Yeadon, Will
    Peach, Alex
    Testrow, Craig
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [22] The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study
    Ohta, Keiichi
    Ohta, Satomi
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (12)
  • [23] Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties
    Luk, Dik Wai Anderson
    Ip, Whitney Chin Tung
    Shea, Yat-fung
    JOURNAL OF THE CHINESE MEDICAL ASSOCIATION, 2024, 87 (03) : 259 - 260
  • [24] Limitations of GPT-3.5 and GPT-4 in Applying Fleischner Society Guidelines to Incidental Lung Nodules
    Gamble, Joel
    Ferguson, Duncan
    Yuen, Joanna
    Sheikh, Adnan
    CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 412 - 416
  • [25] Advancements in AI for Gastroenterology Education: An Assessment of OpenAI's GPT-4 and GPT-3.5 in MKSAP Question Interpretation
    Patel, Akash
    Samreen, Isha
    Ahmed, Imran
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2024, 119 (10S): : S1580 - S1580
  • [26] Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4
    Lahat, Adi
    Sharif, Kassem
    Zoabi, Narmin
    Patt, Yonatan Shneor
    Sharif, Yousra
    Fisher, Lior
    Shani, Uria
    Arow, Mohamad
    Levin, Roni
    Klang, Eyal
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [27] BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study
    Cozzi, Andrea
    Pinker, Katja
    Hidber, Andri
    Zhang, Tianyu
    Bonomo, Luca
    Lo Gullo, Roberto
    Christianson, Blake
    Curti, Marco
    Rizzo, Stefania
    Del Grande, Filippo
    Mann, Ritse M.
    Schiaffino, Simone
    RADIOLOGY, 2024, 311 (01)
  • [28] Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
    Moshirfar, Majid
    Altaf, Amal W.
    Stoakes, Isabella M.
    Tuttle, Jared J.
    Hoopes, Phillip C.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (06)
  • [29] Performance of GPT-4 Vision on kidney pathology exam questions
    Miao, Jing
    Thongprayoon, Charat
    Cheungpasitporn, Wisit
    Cornell, Lynn D.
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024, 162 (03) : 220 - 226
  • [30] Performance of GPT-4 Vision on kidney pathology exam questions
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024,