Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports

被引:42
|
作者
Mukherjee, Pritam [1 ]
Hou, Benjamin [1 ]
Lanfredi, Ricardo B. [1 ]
Summers, Ronald M. [1 ]
机构
[1] Natl Inst Hlth Clin Ctr, Dept Radiol & Imaging Sci, Imaging Biomarkers & Comp Aided Diag Lab, Bldg 10,Room 1C224D,10 Ctr Dr, Bethesda, MD 20892 USA
基金
美国国家卫生研究院;
关键词
AGREEMENT;
D O I
10.1148/radiol.231147
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background: Large language models (LLMs) such as ChatGPT, though proficient in many text-based tasks, are not suitable for use with radiology reports due to patient privacy constraints. Purpose: To test the feasibility of using an alternative LLM (Vicuna-13B) that can be run locally for labeling radiography reports. Materials and Methods: Chest radiography reports from the MIMIC-CXR and National Institutes of Health (NIH) data sets were included in this retrospective study. Reports were examined for 13 findings. Outputs reporting the presence or absence of the 13 find-ings were generated by Vicuna by using a single-step or multistep prompting strategy (prompts 1 and 2, respectively). Agreements between Vicuna outputs and CheXpert and CheXbert labelers were assessed using Fleiss kappa. Agreement between Vicuna outputs from three runs under a hyperparameter setting that introduced some randomness (temperature, 0.7) was also assessed. The performance of Vicuna and the labelers was assessed in a subset of 100 NIH reports annotated by a radiologist with use of area under the receiver operating characteristic curve (AUC). Results: A total of 3269 reports from the MIMIC-CXR data set (median patient age, 68 years [IQR, 59-79 years]; 161 male patients) and 25 596 reports from the NIH data set (median patient age, 47 years [IQR, 32-58 years]; 1557 male patients) were included. Vicuna outputs with prompt 2 showed, on average, moderate to substantial agreement with the labelers on the MIMIC-CXR (kappa me-dian, 0.57 [IQR, 0.45-0.66] with CheXpert and 0.64 [IQR, 0.45-0.68] with CheXbert) and NIH (kappa median, 0.52 [IQR, 0.41-0.65] with CheXpert and 0.55 [IQR, 0.41-0.74] with CheXbert) data sets, respectively. Vicuna with prompt 2 performed at par (median AUC, 0.84 [IQR, 0.74-0.93]) with both labelers on nine of 11 findings. Conclusion: In this proof-of-concept study, outputs of the LLM Vicuna reporting the presence or absence of 13 findings on chest radiography reports showed moderate to substantial agreement with existing labelers. (c) RSNA, 2023
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Feasibility and Prospect of Privacy-preserving Large Language Models in Radiology
    Cai, Wenli
    RADIOLOGY, 2023, 309 (01)
  • [2] Local large language models for privacy-preserving accelerated review of historic echocardiogram reports
    Vaid, Akhil
    Duong, Son Q.
    Lampert, Joshua
    Kovatch, Patricia
    Freeman, Robert
    Argulian, Edgar
    Croft, Lori
    Lerakis, Stamatios
    Goldman, Martin
    Khera, Rohan
    Nadkarni, Girish N.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024,
  • [3] Automated detection of in-hospital drug hypersensitivity reactions using a privacy-preserving large language model
    Dezoteux, Frederic
    Mille, Baptiste
    Shorten, Lucas
    Dehame, Lea
    Badet, Albane
    Staumont-Salle, Delphine
    Bene, Johana
    Rispal, Marie-Amelie
    Hamroun, Aghiles
    Le Guellec, Bastien
    JOURNAL OF THE EUROPEAN ACADEMY OF DERMATOLOGY AND VENEREOLOGY, 2025,
  • [4] Detection of suicidality from medical text using privacy-preserving large language models
    Wiest, Isabella Catharina
    Verhees, Falk Gerrik
    Ferber, Dyke
    Zhu, Jiefu
    Bauer, Michael
    Lewitzka, Ute
    Pfennig, Andrea
    Mikolas, Pavol
    Kather, Jakob Nikolas
    BRITISH JOURNAL OF PSYCHIATRY, 2024, 225 (06) : 532 - 537
  • [5] Privacy-preserving large language models for structured medical information retrieval
    Wiest, Isabella Catharina
    Ferber, Dyke
    Zhu, Jiefu
    van Treeck, Marko
    Meyer, Sonja K.
    Juglan, Radhika
    Carrero, Zunamys I.
    Paech, Daniel
    Kleesiek, Jens
    Ebert, Matthias P.
    Truhn, Daniel
    Kather, Jakob Nikolas
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [6] Model averaging with privacy-preserving
    He, Baihua
    Dong, Fangli
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (04) : 1401 - 1414
  • [7] AXpert: human expert facilitated privacy-preserving large language models for abdominal X-ray report labeling
    Zhang, Yufeng
    Kohne, Joseph G.
    Webster, Katherine
    Vartanian, Rebecca
    Wittrup, Emily
    Najarian, Kayvan
    JAMIA OPEN, 2025, 8 (01)
  • [8] Selective privacy-preserving framework for large language models fine-tuning
    Wang, Teng
    Zhai, Lindong
    Yang, Tengfei
    Luo, Zhucheng
    Liu, Shuanggen
    INFORMATION SCIENCES, 2024, 678
  • [9] Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review
    Feretzakis, Georgios
    Papaspyridis, Konstantinos
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    INFORMATION, 2024, 15 (11)
  • [10] InferDPT: Privacy-preserving Inference for Black-box Large Language Models
    Tong, Meng
    Chen, Kejiang
    Zhang, Jie
    Qi, Yuang
    Zhang, Weiming
    Yu, Nenghai
    Zhang, Tianwei
    Zhang, Zhikun
    arXiv, 2023,