Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports

被引:22
|
作者
Hasani, Amir M. [1 ]
Singh, Shiva [2 ]
Zahergivar, Aryan [2 ]
Ryan, Beth [3 ]
Nethala, Daniel [3 ]
Bravomontenegro, Gabriela [3 ]
Mendhiratta, Neil [3 ]
Ball, Mark [3 ]
Farhadi, Faraz [2 ]
Malayeri, Ashkan [2 ]
机构
[1] NHBLI, Lab Translat Res, NIH, Bethesda, MD USA
[2] NIH, Radiol & Imaging Sci Dept, Clin Ctr, Bethesda, MD 20892 USA
[3] NCI, Urol Oncol Branch, NIH, Bethesda, MD USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Natural language processing; Digital health; Machine learning;
D O I
10.1007/s00330-023-10384-x
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
ObjectiveRadiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports.MethodsA comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports.ResultsThe AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775.ConclusionThe results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice.Clinical relevance statementThe findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
引用
收藏
页码:3566 / 3574
页数:9
相关论文
共 50 条
  • [21] GPT-NAS: Neural Architecture Search Meets Generative Pre-Trained Transformer Model
    Yu, Caiyang
    Liu, Xianggen
    Wang, Yifan
    Liu, Yun
    Feng, Wentao
    Deng, Xiong
    Tang, Chenwei
    Lv, Jiancheng
    BIG DATA MINING AND ANALYTICS, 2025, 8 (01): : 45 - 64
  • [22] GPT-LS: Generative Pre-Trained Transformer with Offline Reinforcement Learning for Logic Synthesis
    Lv, Chenyang
    Wei, Ziling
    Qian, Weikang
    Ye, Junjie
    Feng, Chang
    He, Zhezhi
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 320 - 326
  • [23] Towards Java']JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
    Lajko, Mark
    Csuvik, Viktor
    Vidacs, Laszlo
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 61 - 68
  • [24] GPT4MIA: Utilizing Generative Pre-trained Transformer (GPT-3) as a Plug-and-Play Transductive Model for Medical Image Analysis
    Zhang, Yizhe
    Chen, Danny Z.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 151 - 160
  • [25] Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists
    Jakub Pristoupil
    Laura Oleaga
    Vanesa Junquero
    Cristina Merino
    Ozbek Suha Sureyya
    Martin Kyncl
    Andrea Burgetova
    Lukas Lambert
    Insights into Imaging, 16 (1)
  • [26] HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer
    Xu, Xiaopeng
    Xu, Chencheng
    He, Wenjia
    Wei, Lesong
    Li, Haoyang
    Zhou, Juexiao
    Zhang, Ruochi
    Wang, Yu
    Xiong, Yuanpeng
    Gao, Xin
    BIOINFORMATICS, 2024, 40 (06)
  • [27] Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery
    Diane M. Korngiebel
    Sean D. Mooney
    npj Digital Medicine, 4
  • [28] Medical image Generative Pre-Trained Transformer (MI-GPT): future direction for precision medicine
    Xiaohui Zhang
    Yan Zhong
    Chentao Jin
    Daoyan Hu
    Mei Tian
    Hong Zhang
    European Journal of Nuclear Medicine and Molecular Imaging, 2024, 51 : 332 - 335
  • [29] Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design
    Haroon, Suhail
    Hafsath, C. A.
    Jereesh, A. S.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 106
  • [30] Students' Perspectives on the Application of a Generative Pre-Trained Transformer (GPT) in Chemistry Learning: A Case Study in Indonesia
    Ardyansyah, Ananta
    Yuwono, Agung Budhi
    Rahayu, Sri
    Alsulami, Naif Mastoor
    Sulistina, Oktavia
    JOURNAL OF CHEMICAL EDUCATION, 2024, 101 (09) : 3666 - 3675