Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

被引:2
|
作者
Cho, Hyeongmin [1 ]
Yoo, Sooyoung [2 ]
Kim, Borham [2 ]
Jang, Sowon [3 ]
Sunwoo, Leonard [3 ]
Kim, Sanghwan [1 ]
Lee, Donghyoung [1 ]
Kim, Seok [2 ]
Nam, Sejin [1 ]
Chung, Jin-Haeng [4 ,5 ]
机构
[1] ezCaretech Res & Dev Ctr, Seoul, South Korea
[2] Seoul Natl Univ, Bundang Hosp, Off eHlth Res & Business, Seongnam, South Korea
[3] Seoul Natl Univ, Bundang Hosp, Dept Radiol, Seongnam, South Korea
[4] Seoul Natl Univ, Coll Med, Dept Pathol, Seoul, South Korea
[5] Seoul Natl Univ, Bundang Hosp, Dept Pathol & Translat Med, Seongnam, South Korea
关键词
Deep learning; Natural language processing; Large language model; Information extraction; Pathology report; Tumor-node classification; CLASSIFICATION; EDITION;
D O I
10.1016/j.jbi.2024.104720
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. Objectives: This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. Methods: Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. Results: We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. Conclusion: This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer
    Choi, Hyeon Seok
    Song, Jun Yeong
    Shin, Kyung Hwan
    Chang, Ji Hyun
    Jang, Bum-Sup
    RADIATION ONCOLOGY JOURNAL, 2023, 41 (03): : 209 - 216
  • [2] Automatic Lung Cancer Staging from Medical Reports Using Natural Language Processing
    Sui, X.
    Liu, T.
    Huang, Q.
    Hou, Y.
    Wang, Y.
    Kang, G.
    Guo, H.
    Li, N.
    Li, Y.
    Wang, Z.
    Wang, J.
    JOURNAL OF THORACIC ONCOLOGY, 2018, 13 (10) : S772 - S772
  • [3] Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model
    Coden, Anni
    Savova, Guergana
    Sominsky, Igor
    Tanenblatt, Michael
    Masanz, James
    Schuler, Karin
    Cooper, James
    Guan, Wei
    de Groen, Piet C.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 937 - 949
  • [4] Lung cancer staging: pathology issues
    Sica, Gabriel L.
    Gal, Anthony A.
    SEMINARS IN DIAGNOSTIC PATHOLOGY, 2012, 29 (03) : 116 - 126
  • [5] A deep learning approach for extracting clinically relevant information from pathology reports
    Saib, Waheeda Banu
    Kumar, Pavan
    Siwo, Geoffrey
    Dlamini, Gciniwe
    Singh, Elvira
    Candy, Sue
    Klipin, Michael
    CANCER RESEARCH, 2017, 77 (22)
  • [6] Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports
    Munzone, Elisabetta
    Marra, Antonio
    Comotto, Federico
    Guercio, Lorenzo
    Sangalli, Claudia Anna
    Lo Cascio, Martina
    Pagan, Eleonora
    Sangalli, Davide
    Bigoni, Ilaria
    Porta, Francesca Maria
    D'Ercole, Marianna
    Ritorti, Fabiana
    Bagnardi, Vincenzo
    Fusco, Nicola
    Curigliano, Giuseppe
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [7] Machine Learning Approaches for Extracting Stage from Pathology Reports in Prostate Cancer
    Lenain, Raphael
    Seneviratne, Martin G.
    Bozkurt, Selen
    Blayney, Douglas W.
    Brooks, James D.
    Hernandez-Boussard, Tina
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 1522 - 1523
  • [8] EXTRACTING STRUCTURED INFORMATION FROM PATHOLOGY REPORTS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Odisho, Anobel
    Park, Briton
    Altieri, Nicholas
    Murdoch, William
    Carroll, Peter
    Coopberberg, Matthew
    Yu, Bin
    JOURNAL OF UROLOGY, 2019, 201 (04): : E1031 - E1032
  • [9] LUNG-CANCER - THE LANGUAGE OF STAGING
    ENGELKING, C
    AMERICAN JOURNAL OF NURSING, 1987, 87 (11) : 1434 - &
  • [10] Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach
    Hu, Danqing
    Zhang, Huanyao
    Li, Shaolei
    Wang, Yuhong
    Wu, Nan
    Lu, Xudong
    JMIR MEDICAL INFORMATICS, 2021, 9 (07)