Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

被引:40
|
作者
Jouhet, V. [1 ]
Defossez, G. [1 ]
Burgun, A. [2 ]
le Beux, P. [2 ]
Levillain, P. [3 ]
Ingrand, P. [1 ,4 ]
Claveau, V. [5 ]
机构
[1] Ctr Hosp Univ Poitiers, Fac Med, Unite Epiderniol Biostat & Registre Canc Poitou C, 6 Rue Miletrie,BP 199, F-86034 Poitiers, France
[2] Univ Rennes 1, Fac Med, INSERM, U936, Rennes, France
[3] Univ Poitiers, Fac Med, Ctr Regrp Informat & Stat Anatomopathol Poitou Ch, F-86034 Poitiers, France
[4] INSERM, CIC 802, Poitiers, France
[5] IRISA, CNRS, UMR 6074, Rennes, France
关键词
Medical Informatics; neoplasm; pathology; free text; automated classification; MEDICAL INFORMATICS; HIGH AGREEMENT; LOW KAPPA;
D O I
10.3414/ME11-01-0005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Our study aimed to construct and evaluate functions called "classifiers", produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content. Methods: Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weighting function accounting for the distribution of terms among targeted classes was developed and compared to classic methods based on inverse document frequencies. The classification was performed with support vector machine (SVM) and Naive Bayes classifiers. Two levels of granularity were tested for both the topographical and the morphological axes of the ICD-03 code. The ability to correctly attribute a precise ICD-03 code and the ability to attribute the broad category defined by the International Agency for Research on Cancer (IARC) for the multiple primary cancer registration rules were evaluated using F1-measures. Results: 5121 pathology reports produced by 35 pathologists were selected. The best performance was achieved by our class-weighted descriptor, associated with a SVM classifier. Using this method, the pathology reports were properly classified in the IARC categories with F1-measures of 0.967 for both topography and morphology. The ICD-03 code attribution had lower performance with a 0.715 F1-measure for topography and 0.854 for morphology. Conclusion: These results suggest that free-text pathology reports could be useful as a data source for automated systems in order to identify and notify new cases of cancer. Future work is needed to evaluate the improvement in performance obtained from the use of natural language processing, including the case of multiple tumor description and possible incorporation of other medical documents such as surgical reports.
引用
收藏
页码:242 / 251
页数:10
相关论文
共 50 条
  • [1] CANCER REPORTING FROM OCR FREE-TEXT PATHOLOGY REPORTS
    Zuccon, Guido
    Anthony Nguyen
    Bergheim, Anton
    Grayson, Narelle
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2012, 8 : 327 - 328
  • [2] Classification of cancer stage from free-text histology reports
    McCowan, Iain
    Moore, Darren
    Fry, Mary-Jane
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 922 - +
  • [3] Symbolic rule-based classification of lung cancer stages from free-text pathology reports
    Nguyen, Anthony N.
    Lawley, Michael J.
    Hansen, David P.
    Bowman, Rayleen V.
    Clarke, Belinda E.
    Duhig, Edwina E.
    Colquist, Shoni
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (04) : 440 - 445
  • [4] Automatic Extraction of Cancer Characteristics from Free-Text Pathology Reports for Cancer Notifications
    Anthony Nguyen
    Moore, Julie
    Lawley, Michael
    Hansen, David
    Colquist, Shoni
    HEALTH INFORMATICS: THE TRANSFORMATIVE POWER OF INNOVATION, 2011, 168 : 117 - 124
  • [5] The registry case finding engine: An automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes.
    Hanauer, David A.
    Miela, Gretchen
    Chinnaiyan, Arul M.
    Chang, Alfred E.
    Blayney, Douglas W.
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2007, 205 (05) : 690 - 697
  • [6] A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
    Achilonu, Okechinyere J.
    Olago, Victor
    Singh, Elvira
    Eijkemans, Rene M. J. C.
    Nimako, Gideon
    Musenge, Eustasius
    INFORMATION, 2021, 12 (11)
  • [7] Automated Organ-Level Classification of Free-Text Pathology Reports to Support a Radiology Follow-up Tracking Engine
    Steinkamp, Jackson M.
    Chambers, Charles M.
    Lalevic, Darco
    Zafar, Hanna M.
    Cook, Tessa S.
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2019, 1 (05)
  • [8] Sentence-based Classification of Free-text Breast Cancer Radiology Reports
    Maghsoodi, Aisan
    Sevenster, Merlijn
    Scholtes, Johannes
    Nalbantov, Georgi
    2012 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2012,
  • [9] Automated Information Extraction from Free-Text EEG Reports
    Biswal, Siddharth
    Nip, Zarina
    Moura Junior, Valdcry
    Bianchi, Matt T.
    Rosenthal, Eric S.
    Westover, M. Brandon
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6804 - 6807
  • [10] Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research
    Loepprich, Martin
    Krauss, Felix
    Ganzinger, Matthias
    Senghas, Karsten
    Riezler, Stefan
    Knaup, Petra
    METHODS OF INFORMATION IN MEDICINE, 2016, 55 (04) : 373 - 380