Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

被引:2
|
作者
Cho, Hyeongmin [1 ]
Yoo, Sooyoung [2 ]
Kim, Borham [2 ]
Jang, Sowon [3 ]
Sunwoo, Leonard [3 ]
Kim, Sanghwan [1 ]
Lee, Donghyoung [1 ]
Kim, Seok [2 ]
Nam, Sejin [1 ]
Chung, Jin-Haeng [4 ,5 ]
机构
[1] ezCaretech Res & Dev Ctr, Seoul, South Korea
[2] Seoul Natl Univ, Bundang Hosp, Off eHlth Res & Business, Seongnam, South Korea
[3] Seoul Natl Univ, Bundang Hosp, Dept Radiol, Seongnam, South Korea
[4] Seoul Natl Univ, Coll Med, Dept Pathol, Seoul, South Korea
[5] Seoul Natl Univ, Bundang Hosp, Dept Pathol & Translat Med, Seongnam, South Korea
关键词
Deep learning; Natural language processing; Large language model; Information extraction; Pathology report; Tumor-node classification; CLASSIFICATION; EDITION;
D O I
10.1016/j.jbi.2024.104720
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. Objectives: This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. Methods: Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. Results: We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. Conclusion: This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Extracting parallel fragments from comparable documents using a generative model
    Bakhshaei, Somayeh
    Safabakhsh, Reza
    Khadivi, Shahram
    COMPUTER SPEECH AND LANGUAGE, 2019, 53 : 25 - 42
  • [32] Evolving role of interventional pulmonology in the interdisciplinary approach to the staging and management of lung cancer: Bronchoscopic mediastinal staging of lung cancer
    Gasparini, Stefano
    CLINICAL LUNG CANCER, 2006, 8 (02) : 110 - 115
  • [33] A generative approach to the implementation of language bindings for the Document Object Model
    Padovani, L
    Coen, CS
    Zacchiroli, S
    GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING 2004, PROCEEDINGS, 2004, 3286 : 469 - 487
  • [34] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Park, Hyung Jun
    Park, Namu
    Lee, Jang Ho
    Choi, Myeong Geun
    Ryu, Jin-Sook
    Song, Min
    Choi, Chang-Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [35] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Hyung Jun Park
    Namu Park
    Jang Ho Lee
    Myeong Geun Choi
    Jin-Sook Ryu
    Min Song
    Chang-Min Choi
    BMC Medical Informatics and Decision Making, 22
  • [36] The Fine-Tuned Large Language Model for Extracting the Progressive Bone Metastasis from Unstructured Radiology Reports
    Kanemaru, Noriko
    Yasaka, Koichiro
    Fujita, Nana
    Kanzawa, Jun
    Abe, Osamu
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025, 38 (02): : 865 - 872
  • [37] Development of a Deep Learning Natural Language Processing Model for Classification of Lung Cancer Radiology Reports
    Mithun, S.
    Jha, A. K.
    Sherkhane, U. B.
    Jaiswar, V.
    Nautiyal, A.
    Purandare, N. C.
    Rangarajan, V.
    Dekker, A.
    Wee, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2021, 48 (SUPPL 1) : S330 - S330
  • [38] The International Association for the Study of Lung Cancer Staging Project for Lung Cancer: Proposals for the Revision of the N Descriptors in the Forthcoming Ninth Edition of the TNM Classification for Lung Cancer
    Huang, James
    Osarogiagbon, Raymond U.
    Giroux, Dorothy J.
    Nishimura, Katherine K.
    Bille, Andrea
    Cardillo, Giuseppe
    Detterbeck, Frank
    Kernstine, Kemp
    Kim, Hong Kwan
    Lievens, Yolande
    Lim, Eric
    Marom, Edith
    Prosch, Helmut
    Putora, Paul Martin
    Rami-Porta, Ramon
    Rice, David
    Rocco, Gaetano
    Rusch, Valerie W.
    Opitz, Isabelle
    Vasquez, Francisco Suarez
    Van Schil, Paul
    Yang, Chi-Fu Jeffrey
    Asamura, Hisao
    JOURNAL OF THORACIC ONCOLOGY, 2024, 19 (05) : 766 - 785
  • [39] Research letter: Evaluating the future pathological N descriptors for lung cancer staging in a UK population
    Edwards, Tim
    Balata, Haval
    Foden, Philip
    Crosbie, Philip
    Booton, Richard
    Evison, Matthew
    LUNG CANCER, 2017, 109 : 149 - 151
  • [40] Software Support for Combined Staging of Lung Cancer in CT, Functional MRI and Pathology
    Laue, Hendrik O. A.
    Kohlmann, Peter
    Lotz, Johannes
    Sedlaczek, Oliver
    Mueller, Benedikt
    Breuhahn, Kai
    Grabe, Niels
    Warth, Arne
    Hahn, Horst
    JOURNAL OF THORACIC ONCOLOGY, 2015, 10 (09) : S467 - S467