Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports

被引:0
|
作者
Munzone, Elisabetta [1 ]
Marra, Antonio [2 ]
Comotto, Federico [3 ]
Guercio, Lorenzo [3 ]
Sangalli, Claudia Anna [4 ]
Lo Cascio, Martina [5 ]
Pagan, Eleonora [6 ]
Sangalli, Davide [5 ]
Bigoni, Ilaria [3 ]
Porta, Francesca Maria [7 ]
D'Ercole, Marianna [7 ]
Ritorti, Fabiana [3 ]
Bagnardi, Vincenzo [6 ]
Fusco, Nicola [7 ,8 ]
Curigliano, Giuseppe [2 ,8 ]
机构
[1] IRCCS, European Inst Oncol, Div Med Senol, Milan, Italy
[2] IRCCS, European Inst Oncol, Div Early Drug Dev Innovat Therapies, Milan, Italy
[3] Reply SPA, Turin, Italy
[4] IRCCS, European Inst Oncol, Clin Trial Off, Milan, Italy
[5] IRCCS, European Inst Oncol, Cent Management Informat Syst & Technol, Milan, Italy
[6] Univ Milano Bicocca, Dept Stat & Quantitat Methods, Milan, Italy
[7] IRCCS, European Inst Oncol, Div Pathol, Milan, Italy
[8] Univ Milan, Dept Oncol & Hemato Oncol, Milan, Italy
来源
关键词
D O I
10.1200/CCI.24.00034
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSEElectronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.METHODSDuring the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.RESULTSThe first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).CONCLUSIONThe present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors. A high-accuracy NLP model was developed to extract structured data from breast cancer pathology reports.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] EXTRACTING STRUCTURED INFORMATION FROM PATHOLOGY REPORTS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Odisho, Anobel
    Park, Briton
    Altieri, Nicholas
    Murdoch, William
    Carroll, Peter
    Coopberberg, Matthew
    Yu, Bin
    [J]. JOURNAL OF UROLOGY, 2019, 201 (04): : E1031 - E1032
  • [2] Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer
    Choi, Hyeon Seok
    Song, Jun Yeong
    Shin, Kyung Hwan
    Chang, Ji Hyun
    Jang, Bum-Sup
    [J]. RADIATION ONCOLOGY JOURNAL, 2023, 41 (03): : 209 - 216
  • [3] Extracting hypertrophic cardiomyopathy features from cardiac magnetic resonance reports by natural language processing
    Dewaswala-Bhopalwala, N.
    Chen, D.
    Bhopalwala, H.
    Pour, S. Hossein
    Moon, S.
    Bos, D.
    Scott, C.
    Geske, J.
    Noseworthy, P.
    Ommen, S. R.
    Erickson, B. J.
    Araoz, P. A.
    Nishimura, R. A.
    Ackerman, M. J.
    Arruda-Olson, A. M.
    [J]. EUROPEAN HEART JOURNAL, 2020, 41 : 199 - 199
  • [4] Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records
    Yoojoong Kim
    Jeong Hyeon Lee
    Sunho Choi
    Jeong Moon Lee
    Jong-Ho Kim
    Junhee Seok
    Hyung Joon Joo
    [J]. Scientific Reports, 10
  • [5] Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records
    Kim, Yoojoong
    Lee, Jeong Hyeon
    Choi, Sunho
    Lee, Jeong Moon
    Kim, Jong-Ho
    Seok, Junhee
    Joo, Hyung Joon
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [6] Validation of a Natural Language Processing Algorithm for the Extraction of the Sleep Parameters from the Polysomnography Reports
    Rahman, Mahbubur
    Nowakowski, Sara
    Agrawal, Ritwick
    Naik, Aanand
    Sharafkhaneh, Amir
    Razjouyan, Javad
    [J]. HEALTHCARE, 2022, 10 (10)
  • [7] Development of an algorithm using natural language processing to identify metastatic breast cancer patients from clinical notes.
    Swaminathan, Krishna Kumar
    Mendonca, Emma
    Mukherjee, Pranay
    Thirumalai, Karpagavalli
    Newsome, Rachel
    Narayanan, Babu
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2020, 38 (15)
  • [8] Facilitating cancer research using natural language processing of pathology reports
    Xu, H
    Anderson, K
    Grann, VR
    Friedman, C
    [J]. MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 565 - 569
  • [9] DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records
    Savova, Guergana K.
    Tseytlin, Eugene
    Finan, Sean
    Castine, Melissa
    Miller, Timothy
    Medvedeva, Olga
    Harris, David
    Hochheiser, Harry
    Lin, Chen
    Chavan, Girish
    Jacobson, Rebecca S.
    [J]. CANCER RESEARCH, 2017, 77 (21) : E115 - E118
  • [10] Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study
    Benson, Ryzen
    Winterton, Candace
    Winn, Maci
    Krick, Benjamin
    Liu, Mei
    Abu-el-Rub, Noor
    Conway, Mike
    Del Fiol, Guilherme
    Gawron, Andrew
    Hardikar, Sheetal
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7