Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach

被引:2
|
作者
Viscosi, Carmelo [1 ]
Fidelbo, Paolo [1 ]
Benedetto, Andrea [1 ]
Varvara, Massimo [1 ]
Ferrante, Margherita [1 ]
机构
[1] Azienda Osped Univ Policlin G Rodolico San Marco, Registro Tumori Integrato Catania Messina Enna, UOC Igiene, Dipartimento GF Ingrassia, Via S Sofia 87, I-95123 Catania, Italy
关键词
Machine learning; Binary classification; Natural language processing; Cancer registry; AUTOMATED CLASSIFICATION; PATHOLOGY;
D O I
10.1016/j.ijmedinf.2022.104714
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histopathology reports are a primary data source for the case definition phase of a Cancer Registry. By reading the histopathology report, the operator that evaluates an oncology case can define the morphology and topography of cancer, and validate the case with the highest diagnosis base. The key problem of the Catania-Messina-Enna Integrated Cancer Registry (RTI) is that these reports are written in natural language and relevant information for cancer evaluation is only a little part of the total annual histopathological reports. In this population-based retrospective cohort study, we try to optimize the working time spent by the RTI operators in seeking and selecting the right information among the histopathology reports in the east Sicily population, by developing a binary classifier on a training set of labeled historical data and validating its outcome by a test set of labeled data created by the operators during the years. Using a machine learning algorithm we built a classification model that evaluates each free text report and returns a score that indicates the probability that it contains oncologic relevant information. The best performing algorithm, among the eight analyzed in this study, was the LightGBM that reached an F1Score of 98.9%. Using the chosen classifier we shortened the time for case evaluation, improving the timeliness of cancer statistics.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A hybrid machine learning approach for information extraction from free text
    Neumann, G
    [J]. From Data and Information Analysis to Knowledge Engineering, 2006, : 390 - 397
  • [2] Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach
    Olago, Victor
    Muchengeti, Mazvita
    Singh, Elvira
    Chen, Wenlong C.
    [J]. INFORMATION, 2020, 11 (09)
  • [3] Extracting information from free text radiology reports
    Johnson D.B.
    Taira R.K.
    Cardenas A.F.
    Aberle D.R.
    [J]. International Journal on Digital Libraries, 1997, 1 (3) : 297 - 308
  • [4] Extracting Information from Free-text Mammography Reports
    Esuli, Andrea
    Marcheggiani, Diego
    Sebastiani, Fabrizio
    [J]. ERCIM NEWS, 2010, (82): : 60 - 61
  • [5] EXTRACTING IMPLICIT INFORMATION FROM FREE TEXT TECHNICAL REPORTS
    CAVAZZA, M
    ZWEIGENBAUM, P
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1992, 28 (05) : 609 - 618
  • [6] Comparison of machine learning classifiers for influenza detection from emergency department free-text reports
    Pineda, Arturo Lopez
    Ye, Ye
    Visweswaran, Shyam
    Cooper, Gregory F.
    Wagner, Michael M.
    Tsui, Fuchiang
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 60 - 69
  • [7] Relevance assignation feature selection method based on mutual information for machine learning
    Gao, Liyang
    Wu, Weiguo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 209
  • [8] Automated Information Extraction from Free-Text EEG Reports
    Biswal, Siddharth
    Nip, Zarina
    Moura Junior, Valdcry
    Bianchi, Matt T.
    Rosenthal, Eric S.
    Westover, M. Brandon
    [J]. 2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6804 - 6807
  • [9] Generalizing Machine Learning Models from Medical Free Text
    Pandian, Balaji
    Lakshmanan, Sai S.
    Vandervest, John C.
    Mentz, Graciela
    Kheterpal, Sachin
    Vydiswaran, V. G. V.
    Burns, Michael L.
    [J]. ANESTHESIA AND ANALGESIA, 2022, 134 : 1122 - 1124
  • [10] Deep Learning Approaches Substantially Improve Automated Extraction of Information from Free-Text Medical Reports
    Liu, Tiffany Ting
    [J]. RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2019, 1 (05)