Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study

被引:0
|
作者
Benson, Ryzen [1 ]
Winterton, Candace [2 ]
Winn, Maci [2 ,3 ]
Krick, Benjamin [4 ]
Liu, Mei [5 ]
Abu-el-rub, Noor [5 ]
Conway, Mike [6 ]
Del Fiol, Guilherme [1 ]
Gawron, Andrew [7 ,8 ]
Hardikar, Sheetal [2 ,3 ,9 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT USA
[2] Univ Utah, Huntsman Canc Inst, Salt Lake City, UT USA
[3] Univ Utah, Dept Populat Hlth Sci, Salt Lake City, UT USA
[4] Duke Univ, Dept Polit Sci, Durham, NC USA
[5] Kansas Univ, Med Ctr, Dept Internal Med, Kansas City, KS USA
[6] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic, Australia
[7] Univ Utah, Salt Lake City VA Specialty Care Ctr Innovat, Salt Lake City, UT USA
[8] Univ Utah, Dept Internal Med, Salt Lake City, UT USA
[9] 2000 Circle Hope Dr, Room4711, Salt Lake City, UT 84112 USA
来源
基金
美国国家卫生研究院;
关键词
D O I
暂无
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSEHistopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set.METHODSWe obtained 24,584 pathology reports from colonoscopies performed at the University of Utah's Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data.RESULTSAcross all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (+/- 0.4) cm.CONCLUSIONOur pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study
    Benson, Ryzen
    Winterton, Candace
    Winn, Maci
    Krick, Benjamin
    Liu, Mei
    Abu-el-Rub, Noor
    Conway, Mike
    Del Fiol, Guilherme
    Gawron, Andrew
    Hardikar, Sheetal
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [2] Natural Language Processing Accurately Categorizes Findings From Colonoscopy and Pathology Reports
    Imler, Timothy D.
    Morea, Justin
    Kahi, Charles
    Imperiale, Thomas F.
    [J]. CLINICAL GASTROENTEROLOGY AND HEPATOLOGY, 2013, 11 (06) : 689 - 694
  • [3] USING NATURAL LANGUAGE PROCESSING TO EXTRACT ABNORMAL RESULTS FROM MAMMOGRAPHY REPORTS
    Moore, Carlton R.
    Farrag, Ashraf
    Ashkin, Evan
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2013, 28 : S235 - S235
  • [4] Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports
    Munzone, Elisabetta
    Marra, Antonio
    Comotto, Federico
    Guercio, Lorenzo
    Sangalli, Claudia Anna
    Lo Cascio, Martina
    Pagan, Eleonora
    Sangalli, Davide
    Bigoni, Ilaria
    Porta, Francesca Maria
    D'Ercole, Marianna
    Ritorti, Fabiana
    Bagnardi, Vincenzo
    Fusco, Nicola
    Curigliano, Giuseppe
    [J]. JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [5] Natural Language Processing to Abstract Preneoplastic and Incidental Pulmonary Lesions from Pathology Reports
    Petricca, J.
    French, C.
    Ajaj, R.
    Zelifan, A.
    Grant, B.
    Zhan, L.
    Zhang, Y.
    Thakral, A.
    Nicholls, D.
    Hsu, Y. -H. R.
    Pal, P.
    Cabanero, M.
    Tsao, M. S.
    Liu, G.
    [J]. JOURNAL OF THORACIC ONCOLOGY, 2022, 17 (09) : S515 - S515
  • [6] Natural Language Processing Model to Extract Acute Abnormalities from CT Head Reports
    Torres-Lopez, Victor M.
    Rovenolt, Grace
    Garcia, Gabriella
    Chacko, Sarah
    Olcese, Angelo
    Falcone, Guido
    Payabvash, Sam
    Sharma, Richa
    Sansing, Lauren
    Sheth, Kevin
    Kim, Jennifer A.
    [J]. ANNALS OF NEUROLOGY, 2021, 90 : S187 - S187
  • [7] Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports
    Moore, Carlton R.
    Farrag, Ashraf
    Ashkin, Evan
    [J]. JOURNAL OF PATIENT SAFETY, 2017, 13 (03) : 138 - 143
  • [8] Natural Language Processing Model to Extract Acute Abnormalities from CT Head Reports
    Torres-Lopez, Victor
    Rovenolt, Grace
    Garcia, Gabriella
    Chacko, Sarah
    Herman, Alison
    Alexandria, Soto
    Acosta, Julian
    Payabvash, Sam
    Falcone, Guido
    Sharma, Risha
    Sansing, Lauren
    Sheth, Kevin
    Kim, Jennifer
    [J]. NEUROLOGY, 2021, 96 (15)
  • [9] Natural Language Processing to extract SNOMED-CT codes from pathological reports
    Cazzaniga, Giorgio
    Eccher, Albino
    Munari, Enrico
    Marletta, Stefano
    Bonoldi, Emanuela
    Della Mea, Vincenzo
    Cadei, Moris
    Sbaraglia, Marta
    Guerriero, Angela
    Dei Tos, Angelo Paolo
    Pagni, Fabio
    L'Imperio, Vincenzo
    [J]. PATHOLOGICA, 2023, 115 (06) : 318 - 324
  • [10] Facilitating cancer research using natural language processing of pathology reports
    Xu, H
    Anderson, K
    Grann, VR
    Friedman, C
    [J]. MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 565 - 569