Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation

被引:0
|
作者
Yamagishi, Yosuke [1 ]
Nakamura, Yuta [2 ]
Hanaoka, Shouhei [1 ]
Abe, Osamu [1 ]
机构
[1] Univ Tokyo, Grad Sch Med, Div Radiol & Biomed Engn, 7-3-1 Hongo,Bunkyo Ku, Tokyo 1138655, Japan
[2] Univ Tokyo Hosp, Dept Computat Diagnost Radiol & Prevent Med, Tokyo, Japan
来源
JMIR CANCER | 2025年 / 11卷
关键词
radiology reports; clustering; large language model; natural language processing; information extraction; lung cancer; machine learning;
D O I
10.2196/57275
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background: The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging. However, most publicly available medical datasets are in English, with limited resources in other languages. This scarcity poses a challenge for development of models geared toward non-English downstream tasks. Objective: This study aimed to develop and evaluate an algorithm that uses large language models (LLMs) to extract information from Japanese lung cancer radiology reports and perform clustering analysis. The effectiveness of this approach was assessed and compared with previous supervised methods. Methods: This study employed the MedTxt-RR dataset, comprising 135 Japanese radiology reports from 9 radiologists who interpreted the computed tomography images of 15 lung cancer patients obtained from Radiopaedia. Previously used in the NTCIR-16 (NII Testbeds and Community for Information Access Research) shared task for clustering performance competition, this dataset was ideal for comparing the clustering ability of our algorithm with those of previous methods. The dataset was split into 8 cases for development and 7 for testing, respectively. The study's approach involved using the LLM to extract information pertinent to lung cancer findings and transforming it into numeric features for clustering, using the K-means method. Performance was evaluated using 135 reports for information extraction accuracy and 63 test reports for clustering performance. This study focused on the accuracy of automated systems for extracting tumor size, location, and laterality from clinical reports. The clustering performance was evaluated using normalized mutual information, adjusted mutual information , and the Fowlkes-Mallows index for both the development and test data. Results: The tumor size was accurately identified in 99 out of 135 reports (73.3%), with errors in 36 reports (26.7%), primarily due to missing or incorrect size information. Tumor location and laterality were identified with greater accuracy in 112 out of 135 reports (83%); however, 23 reports (17%) contained errors mainly due to empty values or incorrect data. Clustering performance of the test data yielded an normalized mutual information of 0.6414, adjusted mutual information of 0.5598, and Fowlkes-Mallows index of 0.5354. The proposed method demonstrated superior performance across all evaluation metrics compared to previous methods. Conclusions: The unsupervised LLM approach surpassed the existing supervised methods in clustering Japanese radiology reports. These findings suggest that LLMs hold promise for extracting information from radiology reports and integrating it into disease-specific knowledge structures.
引用
收藏
页数:10
相关论文
共 31 条
  • [1] Zero-shot information extraction from radiological reports using ChatGPT
    Hu, Danqing
    Liu, Bing
    Zhu, Xiaofeng
    Lu, Xudong
    Wu, Nan
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 183
  • [2] LlmRe: A zero-shot entity relation extraction method based on the large language model
    Zhao, Wei
    Chen, Qinghui
    You, Junling
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 475 - 480
  • [3] An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study
    Sivarajkumar, Sonish
    Kelley, Mark
    Samolyk-Mazzanti, Alyssa
    Visweswaran, Shyam
    Wang, Yanshan
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [4] Large Language Model Ranker with Graph Reasoning for Zero-Shot Recommendation
    Zhang, Xuan
    Wei, Chunyu
    Yan, Ruyu
    Fan, Yushun
    Jia, Zhixuan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 356 - 370
  • [5] Thinking Like an Author: A Zero-Shot Learning Approach to Keyphrase Generation with Large Language Model
    Wang, Siyu
    Dai, Shengran
    Jiang, Jianhui
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT III, ECML PKDD 2024, 2024, 14943 : 335 - 350
  • [7] Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
    Park, Briton
    Altieri, Nicholas
    DeNero, John
    Odisho, Anobel Y.
    Yu, Bin
    JAMIA OPEN, 2021, 4 (03)
  • [8] A scoping review of large language model based approaches for information extraction from radiology reports
    Reichenpfader, Daniel
    Muller, Henning
    Denecke, Kerstin
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [9] A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
    Zhuang, Shengyao
    Zhuang, Honglei
    Koopman, Bevan
    Zuccon, Guido
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 38 - 47
  • [10] Can a zero-shot learning Large Language Model code complex interview data?
    Balt, E.
    Salmi, S.
    Bhulai, S.
    Eikelenboom, M.
    Gilissen, R.
    Creemers, D.
    Popma, A.
    Merelle, S.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2024, 34