Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach

被引:17
|
作者
Lindaa, Hammami [1 ]
Alessia, Paglialonga [2 ]
Giancarlo, Pruneri [3 ,4 ]
Michele, Torresani [5 ]
Milenaa, Sant [1 ]
Carlo, Bono [6 ]
Gianluca, Caiani Enrico [2 ,7 ]
Paolo, Baili [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Analyt Epidemiol & Hlth Impact Unit, Via Venezian 1, I-20133 Milan, Italy
[2] Natl Res Council Italy CNR, Inst Elect Comp & Telecommun Engn IEIIT, Milan, Italy
[3] Fdn IRCCS Ist Nazl Tumori, Pathol Dept, Milan, Italy
[4] Univ Milan, Sch Med, Milan, Italy
[5] Fdn IRCCS Ist Nazl Tumori, Hlth Direct, Milan, Italy
[6] Fdn IRCCS Ist Nazl Tumori, Milan, Italy
[7] Politecn Milan, Elect Informat & Biomed Engn Dept, Milan, Italy
关键词
Natural Language Processing; Italian language; Pathology Reports; Cancer morphology;
D O I
10.1016/j.jbi.2021.103712
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pathology reports represent a primary source of information for cancer registries. Hospitals routinely process high volumes of free-text reports, a valuable source of information regarding cancer diagnosis for improving clinical care and supporting research. Information extraction and coding of textual unstructured data is typically a manual, labour-intensive process. There is a need to develop automated approaches to extract meaningful information from such texts in a reliable and accurate way. In this scenario, Natural Language Processing (NLP) algorithms offer a unique opportunity to automatically encode the unstructured reports into structured data, thus representing a potential powerful alternative to expensive manual processing. However, notwithstanding the increasing interest in this area, there is still limited availability of NLP approaches for pathology reports in languages other than English, including Italian, to date. The aim of our work was to develop an automated algorithm based on NLP techniques, able to identify and classify the morphological content of pathology reports in the Italian language with micro-averaged performance scores higher than 95%. Specifically, a novel, domainspecific classifier that uses linguistic rules was developed and tested on 27,239 pathology reports from a single Italian oncological centre, following the International Classification of Diseases for Oncology morphology classification standard (ICD-O-M). The proposed classification algorithm achieved successful results with a micro-F1 score of 98.14% on 9594 pathology reports in the test dataset. This algorithm relies on rules defined on data from a single hospital that is specifically dedicated to cancer, but it is based on general processing steps which can be applied to different datasets. Further research will be important to demonstrate the generalizability of the proposed approach on a larger corpus from different hospitals.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] A Rule-based Approach in Bloom's Taxonomy Question Classification through Natural Language Processing
    Haris, Syahidah Sufi
    Omar, Nazlia
    [J]. 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 410 - 414
  • [2] Automated Text Mining of Prostate Pathology Reports Extracted from an Electronic Medical System, using a Rule-Based Approach
    Karunamuni, R.
    Nalawade, V.
    Bruggeman, A.
    Hopper, A. B.
    Murphy, J. D.
    Einck, J. P.
    Rose, B. S.
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2018, 102 (03): : E271 - E271
  • [3] Automatic Classification of Tumor Response From Radiology Reports With Rule-Based Natural Language Processing Integrated Into the Clinical Oncology Workflow
    Laurent, Gery
    Craynest, Franck
    Thobois, Maxime
    Hajjaji, Nawale
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [4] Automatic Classification of Tumor Response From Radiology Reports With Rule-Based Natural Language Processing Integrated Into the Clinical Oncology Workflow
    Laurent, Gery
    Craynest, Franck
    Thobois, Maxime
    Hajjaji, Nawale
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [5] Automated Classification of NASA Anomalies Using Natural Language Processing Techniques
    Falessi, Davide
    Layman, Lucas
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), 2013, : 5 - 6
  • [6] Symbolic rule-based classification of lung cancer stages from free-text pathology reports
    Nguyen, Anthony N.
    Lawley, Michael J.
    Hansen, David P.
    Bowman, Rayleen V.
    Clarke, Belinda E.
    Duhig, Edwina E.
    Colquist, Shoni
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (04) : 440 - 445
  • [7] Facilitating cancer research using natural language processing of pathology reports
    Xu, H
    Anderson, K
    Grann, VR
    Friedman, C
    [J]. MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 565 - 569
  • [8] Classifying abnormalities in computed tomography radiology reports with rule-based and natural language processing models
    Han, Songyue
    Tian, James
    Kelly, Mark
    Selvakumaran, Vignesh
    Henao, Ricardo
    Rubin, Geoffrey D.
    Lo, Joseph Y.
    [J]. MEDICAL IMAGING 2019: COMPUTER-AIDED DIAGNOSIS, 2019, 10950
  • [9] Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
    Tan, Wee-Ming
    Teoh, Kean-Hooi
    Ganggayah, Mogana
    Taib, Nur
    Zaini, Hana
    Dhillon, Sarinder
    [J]. DIAGNOSTICS, 2022, 12 (04)
  • [10] Using rule-based natural language processing to improve disease normalization in biomedical text
    Kang, Ning
    Singh, Bharat
    Afzal, Zubair
    van Mulligen, Erik M.
    Kors, Jan A.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) : 876 - 881