Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports

被引:2
|
作者
Loor-Torres, Ricardo [1 ]
Wu, Yuqi [2 ]
Cabezas, Esteban [1 ]
Borras-Osorio, Mariana [1 ]
Toro-Tobon, David [3 ]
Duran, Mayra [1 ]
Al Zahidy, Misk [1 ]
Chavez, Maria Mateo [1 ]
Jacome, Cristian Soto [1 ]
Fan, Jungwei W. [2 ]
Ospina, Naykky M. Singh [4 ]
Wu, Yonghui [5 ]
Brito, Juan P. [1 ,3 ]
机构
[1] Mayo Clin, Div Endocrinol Diabet Nutr & Metab, Knowledge & Evaluat Res Unit, 200 First St SW, Rochester, MN 55902 USA
[2] Mayo Clin, Dept Artificial Intelligence & Informat, Rochester, MN USA
[3] Mayo Clin, Div Endocrinol Diabet Metab & Nutr, Rochester, MN USA
[4] Univ Florida, Dept Med, Div Endocrinol, Gainesville, FL USA
[5] Univ Florida, Dept Hlth Outcomes & Biomed Informat, Gainesville, FL USA
基金
美国国家卫生研究院;
关键词
artificial intelligence; Natural Language Processing; thyroid cancer;
D O I
10.1016/j.eprac.2024.08.008
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: We aim to use Natural Language Processing to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods: We analyzed 1410 surgical pathology reports from adult papillary thyroid cancer patients from 2010 to 2019. Structured and nonstructured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Nonstructured reports were narrative, while structured reports followed standardized formats. We developed ThyroPath, a rule-based Natural Language Processing pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score; a metric that combines precision and recall evaluation. Results: In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90% for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all risk categories with human extracted pathology information. Conclusions: ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation. (c) 2024 AACE. Published by Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
引用
收藏
页码:1051 / 1058
页数:8
相关论文
共 50 条
  • [31] NATURAL LANGUAGE PROCESSING ALLOWS FOR ACCURATE AND AUTOMATED EXTRACTION OF DATA FROM PROSTATE BIOPSY PATHOLOGY REPORTS
    Joice, Gregory
    Chee, Brant
    Gupta, Natasha
    Johnson, Michael
    JOURNAL OF UROLOGY, 2018, 199 (04): : E1025 - E1026
  • [32] NATURAL LANGUAGE PROCESSING ENABLES HIGHLY ACCURATE EXTRACTION OF DETAILED HISTOPATHOLOGIC FEATURES OF NON-ALCOHOLIC STEATOHEPATITIS AND FIBROSIS FROM PATHOLOGY REPORTS
    Sherman, Marc
    Challa, Prasanna Kumar
    Ott, Ashley T.
    Przybyszewski, Eric M.
    Anya, Eugenia Uche
    Wilechansky, Robert
    McGoldrick, Jessica
    Goessling, Wolfram
    Khalili, Hamed
    Simon, Tracey
    GASTROENTEROLOGY, 2022, 162 (07) : S1203 - S1203
  • [33] The implementation of natural language processing to extract index lesions from breast magnetic resonance imaging reports
    Yi Liu
    Qing Liu
    Chao Han
    Xiaodong Zhang
    Xiaoying Wang
    BMC Medical Informatics and Decision Making, 19
  • [34] The implementation of natural language processing to extract index lesions from breast magnetic resonance imaging reports
    Liu, Yi
    Liu, Qing
    Han, Chao
    Zhang, Xiaodong
    Wang, Xiaoying
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [35] An Integrated Voice Recognition and Natural Language Processing Platform to Automatically Extract Thoracolumbar Injury Classification Score (TLICS) Features from Radiology Reports
    Bhandarkar, Archis R.
    Onyedimma, Chiduziem
    Jarrah, Ryan
    Fu, Sunyang
    Liu, Hongfang
    Bydon, Mohamad
    NEUROSURGERY, 2022, 68 : 50 - 51
  • [36] Prognosis of P16 and HPV Discordant Oropharyngeal Cancers: Natural Language Processing to Extract Data from Free-Text Pathology Reports
    Shin, E.
    Cartano, O.
    Lee, N. Y.
    Kang, J. J.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2022, 114 (03): : E322 - E322
  • [37] Information Extraction from Cancer Pathology Reports with Graph Convolution Networks for Natural Language Texts
    Yoon, Hong-Jun
    Gounley, John
    Young, M. Todd
    Tourassi, Georgia
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4561 - 4564
  • [38] Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
    Tan, Wee-Ming
    Teoh, Kean-Hooi
    Ganggayah, Mogana
    Taib, Nur
    Zaini, Hana
    Dhillon, Sarinder
    DIAGNOSTICS, 2022, 12 (04)
  • [39] Automatic Lung Cancer Staging from Medical Reports Using Natural Language Processing
    Sui, X.
    Liu, T.
    Huang, Q.
    Hou, Y.
    Wang, Y.
    Kang, G.
    Guo, H.
    Li, N.
    Li, Y.
    Wang, Z.
    Wang, J.
    JOURNAL OF THORACIC ONCOLOGY, 2018, 13 (10) : S772 - S772
  • [40] Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach
    Lindaa, Hammami
    Alessia, Paglialonga
    Giancarlo, Pruneri
    Michele, Torresani
    Milenaa, Sant
    Carlo, Bono
    Gianluca, Caiani Enrico
    Paolo, Baili
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 116