Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach

被引:19
|
作者
Lindaa, Hammami [1 ]
Alessia, Paglialonga [2 ]
Giancarlo, Pruneri [3 ,4 ]
Michele, Torresani [5 ]
Milenaa, Sant [1 ]
Carlo, Bono [6 ]
Gianluca, Caiani Enrico [2 ,7 ]
Paolo, Baili [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Analyt Epidemiol & Hlth Impact Unit, Via Venezian 1, I-20133 Milan, Italy
[2] Natl Res Council Italy CNR, Inst Elect Comp & Telecommun Engn IEIIT, Milan, Italy
[3] Fdn IRCCS Ist Nazl Tumori, Pathol Dept, Milan, Italy
[4] Univ Milan, Sch Med, Milan, Italy
[5] Fdn IRCCS Ist Nazl Tumori, Hlth Direct, Milan, Italy
[6] Fdn IRCCS Ist Nazl Tumori, Milan, Italy
[7] Politecn Milan, Elect Informat & Biomed Engn Dept, Milan, Italy
关键词
Natural Language Processing; Italian language; Pathology Reports; Cancer morphology;
D O I
10.1016/j.jbi.2021.103712
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pathology reports represent a primary source of information for cancer registries. Hospitals routinely process high volumes of free-text reports, a valuable source of information regarding cancer diagnosis for improving clinical care and supporting research. Information extraction and coding of textual unstructured data is typically a manual, labour-intensive process. There is a need to develop automated approaches to extract meaningful information from such texts in a reliable and accurate way. In this scenario, Natural Language Processing (NLP) algorithms offer a unique opportunity to automatically encode the unstructured reports into structured data, thus representing a potential powerful alternative to expensive manual processing. However, notwithstanding the increasing interest in this area, there is still limited availability of NLP approaches for pathology reports in languages other than English, including Italian, to date. The aim of our work was to develop an automated algorithm based on NLP techniques, able to identify and classify the morphological content of pathology reports in the Italian language with micro-averaged performance scores higher than 95%. Specifically, a novel, domainspecific classifier that uses linguistic rules was developed and tested on 27,239 pathology reports from a single Italian oncological centre, following the International Classification of Diseases for Oncology morphology classification standard (ICD-O-M). The proposed classification algorithm achieved successful results with a micro-F1 score of 98.14% on 9594 pathology reports in the test dataset. This algorithm relies on rules defined on data from a single hospital that is specifically dedicated to cancer, but it is based on general processing steps which can be applied to different datasets. Further research will be important to demonstrate the generalizability of the proposed approach on a larger corpus from different hospitals.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Automatic Lung Cancer Staging from Medical Reports Using Natural Language Processing
    Sui, X.
    Liu, T.
    Huang, Q.
    Hou, Y.
    Wang, Y.
    Kang, G.
    Guo, H.
    Li, N.
    Li, Y.
    Wang, Z.
    Wang, J.
    JOURNAL OF THORACIC ONCOLOGY, 2018, 13 (10) : S772 - S772
  • [42] Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports
    Moore, Carlton R.
    Farrag, Ashraf
    Ashkin, Evan
    JOURNAL OF PATIENT SAFETY, 2017, 13 (03) : 138 - 143
  • [43] Identification of high-risk lesions through automated natural language processing (NLP) of pathology reports
    Ozanne, E. M.
    Shorko, J.
    Drohon, B.
    Grinstein, G.
    Hughes, K. S.
    CANCER RESEARCH, 2009, 69 (02) : 205S - 205S
  • [44] Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts
    Ladas, Nektarios
    Borchert, Florian
    Franz, Stefan
    Rehberg, Alina
    Strauch, Natalia
    Sommer, Kim Katrin
    Marschollek, Michael
    Gietzelt, Matthias
    HEALTH INFORMATICS JOURNAL, 2023, 29 (02)
  • [45] Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports
    Loor-Torres, Ricardo
    Wu, Yuqi
    Cabezas, Esteban
    Borras-Osorio, Mariana
    Toro-Tobon, David
    Duran, Mayra
    Al Zahidy, Misk
    Chavez, Maria Mateo
    Jacome, Cristian Soto
    Fan, Jungwei W.
    Ospina, Naykky M. Singh
    Wu, Yonghui
    Brito, Juan P.
    ENDOCRINE PRACTICE, 2024, 30 (11) : 1051 - 1058
  • [46] Automated Assessment of the Quality of Peer Reviews using Natural Language Processing Techniques
    Ramachandran L.
    Gehringer E.F.
    Yadav R.K.
    International Journal of Artificial Intelligence in Education, 2017, 27 (3) : 534 - 581
  • [47] Automated Genre Classification of Books Using Machine Learning and Natural Language Processing
    Gupta, Shikha
    Agarwal, Mohit
    Jain, Satbir
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 269 - 272
  • [48] Natural Language Processing Methods and Techniques for Knowledge Extraction from School Reports
    Venturi, Giulia
    Dell'Orletta, Felice
    Montemagni, Simonetta
    Morini, Elettra
    Sagri, Maria Teresa
    CADMO, 2020, (02): : 49 - +
  • [49] Natural language processing for aviation safety reports: From classification to interactive analysis
    Tanguy, Ludovic
    Tulechki, Nikola
    Urieli, Assaf
    Hermann, Eric
    Raynal, Celine
    COMPUTERS IN INDUSTRY, 2016, 78 : 80 - 95
  • [50] Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach
    Desmond Chun Hwee Teo
    Yiting Huang
    Sreemanee Raaj Dorajoo
    Michelle Sau Yuen Ng
    Chih Tzer Choong
    Doris Sock Tin Phuah
    Dorothy Hooi Myn Tan
    Filina Meixuan Tan
    Huilin Huang
    Maggie Siok Hwee Tan
    Suan Tian Koh
    Jalene Wang Woon Poh
    Pei San Ang
    Drug Safety, 2023, 46 : 975 - 989