Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation

被引：0

作者：

Yamagishi, Yosuke ^{[1
]}

Nakamura, Yuta ^{[2
]}

Hanaoka, Shouhei ^{[1
]}

Abe, Osamu ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Med, Div Radiol & Biomed Engn, 7-3-1 Hongo,Bunkyo Ku, Tokyo 1138655, Japan

[2] Univ Tokyo Hosp, Dept Computat Diagnost Radiol & Prevent Med, Tokyo, Japan

来源：

JMIR CANCER | 2025年 / 11卷

关键词：

radiology reports; clustering; large language model; natural language processing; information extraction; lung cancer; machine learning;

D O I：

10.2196/57275

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Background: The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging. However, most publicly available medical datasets are in English, with limited resources in other languages. This scarcity poses a challenge for development of models geared toward non-English downstream tasks. Objective: This study aimed to develop and evaluate an algorithm that uses large language models (LLMs) to extract information from Japanese lung cancer radiology reports and perform clustering analysis. The effectiveness of this approach was assessed and compared with previous supervised methods. Methods: This study employed the MedTxt-RR dataset, comprising 135 Japanese radiology reports from 9 radiologists who interpreted the computed tomography images of 15 lung cancer patients obtained from Radiopaedia. Previously used in the NTCIR-16 (NII Testbeds and Community for Information Access Research) shared task for clustering performance competition, this dataset was ideal for comparing the clustering ability of our algorithm with those of previous methods. The dataset was split into 8 cases for development and 7 for testing, respectively. The study's approach involved using the LLM to extract information pertinent to lung cancer findings and transforming it into numeric features for clustering, using the K-means method. Performance was evaluated using 135 reports for information extraction accuracy and 63 test reports for clustering performance. This study focused on the accuracy of automated systems for extracting tumor size, location, and laterality from clinical reports. The clustering performance was evaluated using normalized mutual information, adjusted mutual information , and the Fowlkes-Mallows index for both the development and test data. Results: The tumor size was accurately identified in 99 out of 135 reports (73.3%), with errors in 36 reports (26.7%), primarily due to missing or incorrect size information. Tumor location and laterality were identified with greater accuracy in 112 out of 135 reports (83%); however, 23 reports (17%) contained errors mainly due to empty values or incorrect data. Clustering performance of the test data yielded an normalized mutual information of 0.6414, adjusted mutual information of 0.5598, and Fowlkes-Mallows index of 0.5354. The proposed method demonstrated superior performance across all evaluation metrics compared to previous methods. Conclusions: The unsupervised LLM approach surpassed the existing supervised methods in clustering Japanese radiology reports. These findings suggest that LLMs hold promise for extracting information from radiology reports and integrating it into disease-specific knowledge structures.

引用

页数：10

共 31 条

[1] Zero-shot information extraction from radiological reports using ChatGPT
Hu, Danqing
Liu, Bing
Zhu, Xiaofeng
Lu, Xudong
Wu, Nan
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 183
[2] LlmRe: A zero-shot entity relation extraction method based on the large language model
Zhao, Wei
Chen, Qinghui
You, Junling
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 475 - 480
[3] An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study
Sivarajkumar, Sonish
Kelley, Mark
Samolyk-Mazzanti, Alyssa
Visweswaran, Shyam
Wang, Yanshan
JMIR MEDICAL INFORMATICS, 2024, 12
[4] Large Language Model Ranker with Graph Reasoning for Zero-Shot Recommendation
Zhang, Xuan
Wei, Chunyu
Yan, Ruyu
Fan, Yushun
Jia, Zhixuan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 356 - 370
[5] Thinking Like an Author: A Zero-Shot Learning Approach to Keyphrase Generation with Large Language Model
Wang, Siyu
Dai, Shengran
Jiang, Jianhui
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT III, ECML PKDD 2024, 2024, 14943 : 335 - 350
[6] Zero-Shot Learning With Large Language Models Enhances Drilling-Information Retrieval
2025, 77 (01): : 92 - 95
[7] Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
Park, Briton
Altieri, Nicholas
DeNero, John
Odisho, Anobel Y.
Yu, Bin
JAMIA OPEN, 2021, 4 (03)
[8] A scoping review of large language model based approaches for information extraction from radiology reports
Reichenpfader, Daniel
Muller, Henning
Denecke, Kerstin
NPJ DIGITAL MEDICINE, 2024, 7 (01):
[9] A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Zhuang, Shengyao
Zhuang, Honglei
Koopman, Bevan
Zuccon, Guido
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 38 - 47
[10] Can a zero-shot learning Large Language Model code complex interview data?
Balt, E.
Salmi, S.
Bhulai, S.
Eikelenboom, M.
Gilissen, R.
Creemers, D.
Popma, A.
Merelle, S.
EUROPEAN JOURNAL OF PUBLIC HEALTH, 2024, 34

← 1 2 3 4 →