Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

被引:40
|
作者
Jouhet, V. [1 ]
Defossez, G. [1 ]
Burgun, A. [2 ]
le Beux, P. [2 ]
Levillain, P. [3 ]
Ingrand, P. [1 ,4 ]
Claveau, V. [5 ]
机构
[1] Ctr Hosp Univ Poitiers, Fac Med, Unite Epiderniol Biostat & Registre Canc Poitou C, 6 Rue Miletrie,BP 199, F-86034 Poitiers, France
[2] Univ Rennes 1, Fac Med, INSERM, U936, Rennes, France
[3] Univ Poitiers, Fac Med, Ctr Regrp Informat & Stat Anatomopathol Poitou Ch, F-86034 Poitiers, France
[4] INSERM, CIC 802, Poitiers, France
[5] IRISA, CNRS, UMR 6074, Rennes, France
关键词
Medical Informatics; neoplasm; pathology; free text; automated classification; MEDICAL INFORMATICS; HIGH AGREEMENT; LOW KAPPA;
D O I
10.3414/ME11-01-0005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Our study aimed to construct and evaluate functions called "classifiers", produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content. Methods: Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weighting function accounting for the distribution of terms among targeted classes was developed and compared to classic methods based on inverse document frequencies. The classification was performed with support vector machine (SVM) and Naive Bayes classifiers. Two levels of granularity were tested for both the topographical and the morphological axes of the ICD-03 code. The ability to correctly attribute a precise ICD-03 code and the ability to attribute the broad category defined by the International Agency for Research on Cancer (IARC) for the multiple primary cancer registration rules were evaluated using F1-measures. Results: 5121 pathology reports produced by 35 pathologists were selected. The best performance was achieved by our class-weighted descriptor, associated with a SVM classifier. Using this method, the pathology reports were properly classified in the IARC categories with F1-measures of 0.967 for both topography and morphology. The ICD-03 code attribution had lower performance with a 0.715 F1-measure for topography and 0.854 for morphology. Conclusion: These results suggest that free-text pathology reports could be useful as a data source for automated systems in order to identify and notify new cases of cancer. Future work is needed to evaluate the improvement in performance obtained from the use of natural language processing, including the case of multiple tumor description and possible incorporation of other medical documents such as surgical reports.
引用
收藏
页码:242 / 251
页数:10
相关论文
共 50 条
  • [21] Automated Classification of Pathology Reports
    Oleynik, Michel
    Finger, Marcelo
    Patrao, Diogo F. C.
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1040 - 1040
  • [22] Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches
    Kasthurirathne, Suranga Nath
    Dixon, Brian E.
    Grannis, Shaun J.
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1070 - 1070
  • [23] Machine Learning-Based Extraction of Breast Cancer Receptor Status From Bilingual Free-Text Pathology Reports
    Pironet, Antoine
    Poirel, Helene A.
    Tambuyzer, Tim
    De Schutter, Harlinde
    van Walle, Lien
    Mattheijssens, Joris
    Henau, Kris
    Van Eycken, Liesbet
    Van Damme, Nancy
    FRONTIERS IN DIGITAL HEALTH, 2021, 3
  • [24] Deep learning for natural language processing of free-text pathology reports: a comparison of learning curves
    Senders, Joeky T.
    Cote, David J.
    Mehrtash, Alireza
    Wiemann, Robert
    Gormley, William B.
    Smith, Timothy R.
    Broekman, Marike L. D.
    Arnaout, Omar
    BMJ INNOVATIONS, 2020, 6 (04) : 192 - 198
  • [25] Identifying risks areas related to medication administrations - text mining analysis using free-text descriptions of incident reports
    Marja Härkänen
    Jussi Paananen
    Trevor Murrells
    Anne Marie Rafferty
    Bryony Dean Franklin
    BMC Health Services Research, 19
  • [26] Extracting Information from Free-text Mammography Reports
    Esuli, Andrea
    Marcheggiani, Diego
    Sebastiani, Fabrizio
    ERCIM NEWS, 2010, (82): : 60 - 61
  • [27] Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports
    Senders, Joeky T.
    Karhade, Aditya V.
    Cote, David J.
    Mehrtash, Alireza
    Lamba, Nayan
    DiRisio, Aislyn
    Muskens, Ivo S.
    Gormley, William B.
    Smith, Timothy R.
    Broekman, Marike L. D.
    Arnaout, Omar
    JCO CLINICAL CANCER INFORMATICS, 2019, 3 : 1 - 9
  • [28] Deep Learning to Classify Radiology Free-Text Reports
    Chen, Matthew C.
    Ball, Robyn L.
    Yang, Lingyao
    Moradzadeh, Nathaniel
    Chapman, Brian E.
    Larson, David B.
    Langlotz, Curtis P.
    Amrhein, Timothy J.
    Lungren, Matthew P.
    RADIOLOGY, 2018, 286 (03) : 845 - 852
  • [29] Automated free-text assessment: some lessons learned
    Dessus, Philippe
    Lemaire, Benoit
    Loiseau, Mathieu
    Mandin, Sonia
    Villiot-Leclercq, Emmanuelle
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2011, 21 (2-3) : 140 - 154
  • [30] Multi-class classification of cancer stages from free-text histology reports using support vector machines
    Nguyen, Anthony
    Moore, Darren
    McCowan, Lain
    Courage, Mary-Jane
    2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 5140 - 5143