Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach

被引:19
|
作者
Lindaa, Hammami [1 ]
Alessia, Paglialonga [2 ]
Giancarlo, Pruneri [3 ,4 ]
Michele, Torresani [5 ]
Milenaa, Sant [1 ]
Carlo, Bono [6 ]
Gianluca, Caiani Enrico [2 ,7 ]
Paolo, Baili [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Analyt Epidemiol & Hlth Impact Unit, Via Venezian 1, I-20133 Milan, Italy
[2] Natl Res Council Italy CNR, Inst Elect Comp & Telecommun Engn IEIIT, Milan, Italy
[3] Fdn IRCCS Ist Nazl Tumori, Pathol Dept, Milan, Italy
[4] Univ Milan, Sch Med, Milan, Italy
[5] Fdn IRCCS Ist Nazl Tumori, Hlth Direct, Milan, Italy
[6] Fdn IRCCS Ist Nazl Tumori, Milan, Italy
[7] Politecn Milan, Elect Informat & Biomed Engn Dept, Milan, Italy
关键词
Natural Language Processing; Italian language; Pathology Reports; Cancer morphology;
D O I
10.1016/j.jbi.2021.103712
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pathology reports represent a primary source of information for cancer registries. Hospitals routinely process high volumes of free-text reports, a valuable source of information regarding cancer diagnosis for improving clinical care and supporting research. Information extraction and coding of textual unstructured data is typically a manual, labour-intensive process. There is a need to develop automated approaches to extract meaningful information from such texts in a reliable and accurate way. In this scenario, Natural Language Processing (NLP) algorithms offer a unique opportunity to automatically encode the unstructured reports into structured data, thus representing a potential powerful alternative to expensive manual processing. However, notwithstanding the increasing interest in this area, there is still limited availability of NLP approaches for pathology reports in languages other than English, including Italian, to date. The aim of our work was to develop an automated algorithm based on NLP techniques, able to identify and classify the morphological content of pathology reports in the Italian language with micro-averaged performance scores higher than 95%. Specifically, a novel, domainspecific classifier that uses linguistic rules was developed and tested on 27,239 pathology reports from a single Italian oncological centre, following the International Classification of Diseases for Oncology morphology classification standard (ICD-O-M). The proposed classification algorithm achieved successful results with a micro-F1 score of 98.14% on 9594 pathology reports in the test dataset. This algorithm relies on rules defined on data from a single hospital that is specifically dedicated to cancer, but it is based on general processing steps which can be applied to different datasets. Further research will be important to demonstrate the generalizability of the proposed approach on a larger corpus from different hospitals.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Accuracy of Rule-based Natural Language Processing Models for Identification of Pulmonary Embolism
    Rashedi, Sina
    Krishnathasan, Darsiya
    Khairani, Candrika
    Bejjani, Antoine
    Lo, Ying-Chih
    Zarghami, Mehrdad
    Mahajan, Shiwani
    Caraballo, Cesar
    Ceja, Jose Victor Jimenez
    Jimenez, David
    Monreal, Manuel
    Secemsky, Eric
    Klok, Erik
    Hunsaker, Andetta
    Aghayev, Ayaz
    Muriel, Alfonso
    Hussain, Mohamad
    Appah-Sampong, Abena
    Aneja, Sanjay
    Mojibian, Hamid
    Goldhaber, Samuel
    Wang, Liqin
    Zhou, Li
    Krumholz, Harlan
    Piazza, Gregory
    Bikdeli, Behnood
    CIRCULATION, 2024, 150
  • [22] Stemming algorithm for Kazakh Language using rule-based approach
    Sultanova, Nazerke
    Kozhakhmet, Kanat
    Jantayev, Ruslan
    Botbayeva, Azhar
    2019 15TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2019,
  • [23] Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing
    Lee, Jaimie J.
    Zepeda, Andres
    Arbour, Gregory
    Isaac, Kathryn V.
    Ng, Raymond T.
    Nichol, Alan M.
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [24] Automated interpretation of stress echocardiography reports using natural language processing
    Zheng, Chengyi
    Sun, Benjamin C.
    Wu, Yi-Lin
    Ferencik, Maros
    Lee, Ming-Sum
    Redberg, Rita F.
    Kawatkar, Aniket A.
    Musigdilok, Visanee V.
    Sharp, Adam L.
    EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2022, 3 (04): : 626 - 637
  • [25] Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer
    Fang, Chao
    Markuzon, Natasha
    Patel, Nikunj
    Rueda, Juan-David
    VALUE IN HEALTH, 2022, 25 (12) : 1995 - 2002
  • [26] Automated Detection of Pain from Facial Expressions: A Rule-Based Approach Using AAM
    Chen, Zhanli
    Ansari, Rashid
    Wilkie, Diana J.
    MEDICAL IMAGING 2012: IMAGE PROCESSING, 2012, 8314
  • [27] An approach to using XML and a rule-based content language with an agent communication language
    Grosof, BN
    Labrou, Y
    ISSUES IN AGENT COMMUNICATION, 2000, 1916 : 96 - 117
  • [28] EXTRACTING STRUCTURED INFORMATION FROM PATHOLOGY REPORTS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Odisho, Anobel
    Park, Briton
    Altieri, Nicholas
    Murdoch, William
    Carroll, Peter
    Coopberberg, Matthew
    Yu, Bin
    JOURNAL OF UROLOGY, 2019, 201 (04): : E1031 - E1032
  • [29] Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing
    Glaser, Alexander P.
    Jordan, Brian J.
    Cohen, Jason
    Desai, Anuj
    Silberman, Philip
    Meeks, Joshua J.
    JCO CLINICAL CANCER INFORMATICS, 2018, 2 : 1 - 8
  • [30] Automated StrokeRelated Information Extraction From Diagnostic Imaging Reports Using Natural Language Processing
    Liu, Zhongyu Anna
    Mamdani, Muhammad
    Aviv, Richard
    Pou-Prom, Chloe
    Yu, Amy
    STROKE, 2020, 51