Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

被引:40
|
作者
Jouhet, V. [1 ]
Defossez, G. [1 ]
Burgun, A. [2 ]
le Beux, P. [2 ]
Levillain, P. [3 ]
Ingrand, P. [1 ,4 ]
Claveau, V. [5 ]
机构
[1] Ctr Hosp Univ Poitiers, Fac Med, Unite Epiderniol Biostat & Registre Canc Poitou C, 6 Rue Miletrie,BP 199, F-86034 Poitiers, France
[2] Univ Rennes 1, Fac Med, INSERM, U936, Rennes, France
[3] Univ Poitiers, Fac Med, Ctr Regrp Informat & Stat Anatomopathol Poitou Ch, F-86034 Poitiers, France
[4] INSERM, CIC 802, Poitiers, France
[5] IRISA, CNRS, UMR 6074, Rennes, France
关键词
Medical Informatics; neoplasm; pathology; free text; automated classification; MEDICAL INFORMATICS; HIGH AGREEMENT; LOW KAPPA;
D O I
10.3414/ME11-01-0005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Our study aimed to construct and evaluate functions called "classifiers", produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content. Methods: Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weighting function accounting for the distribution of terms among targeted classes was developed and compared to classic methods based on inverse document frequencies. The classification was performed with support vector machine (SVM) and Naive Bayes classifiers. Two levels of granularity were tested for both the topographical and the morphological axes of the ICD-03 code. The ability to correctly attribute a precise ICD-03 code and the ability to attribute the broad category defined by the International Agency for Research on Cancer (IARC) for the multiple primary cancer registration rules were evaluated using F1-measures. Results: 5121 pathology reports produced by 35 pathologists were selected. The best performance was achieved by our class-weighted descriptor, associated with a SVM classifier. Using this method, the pathology reports were properly classified in the IARC categories with F1-measures of 0.967 for both topography and morphology. The ICD-03 code attribution had lower performance with a 0.715 F1-measure for topography and 0.854 for morphology. Conclusion: These results suggest that free-text pathology reports could be useful as a data source for automated systems in order to identify and notify new cases of cancer. Future work is needed to evaluate the improvement in performance obtained from the use of natural language processing, including the case of multiple tumor description and possible incorporation of other medical documents such as surgical reports.
引用
收藏
页码:242 / 251
页数:10
相关论文
共 50 条
  • [41] Content Analysis of Reporting Templates and Free-Text Radiology Reports
    Hong, Yi
    Kahn, Charles E., Jr.
    JOURNAL OF DIGITAL IMAGING, 2013, 26 (05) : 843 - 849
  • [42] PRACTICAL USE OF A FREE-TEXT COMPUTER-SYSTEM IN PATHOLOGY
    SHARPE, TC
    MEDICAL INFORMATICS, 1986, 11 (02): : 185 - 190
  • [43] Curation of the CANDID-PTX Dataset with Free-Text Reports
    Feng, Sijing
    Azzollini, Damian
    Kim, Ji Soo
    Jin, Cheng-Kai
    Gordon, Simon P.
    Yeoh, Jason
    Kim, Eve
    Han, Mina
    Lee, Andrew
    Patel, Aakash
    Wu, Joy
    Urschler, Martin
    Fong, Amy
    Simmers, Cameron
    Tarr, Gregory P.
    Barnard, Stuart
    Wilson, Ben
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2021, 3 (06)
  • [44] Automated Misspelling Detection and Correction in Clinical Free-Text Records
    Nazir, Aiman Khan
    Zafar, Iqra
    Fatima, Alia
    Qamar, Usman
    Shaheen, Asma
    Maqbool, Bilal
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 277 - 280
  • [45] Strengths and weaknesses of automated scoring of free-text student answers
    Marie Bexte
    Andrea Horbach
    Torsten Zesch
    Informatik Spektrum, 2024, 47 (3) : 78 - 86
  • [46] Coded chief complaints - Automated analysis of free-text complaints
    Thompson, David A.
    Eitel, David
    Fernandes, Christopher M. B.
    Pines, Jesse M.
    Amsterdam, James
    Davidson, Steven J.
    ACADEMIC EMERGENCY MEDICINE, 2006, 13 (07) : 774 - 782
  • [47] Automated de-identification of free-text medical records
    Ishna Neamatullah
    Margaret M Douglass
    Li-wei H Lehman
    Andrew Reisner
    Mauricio Villarroel
    William J Long
    Peter Szolovits
    George B Moody
    Roger G Mark
    Gari D Clifford
    BMC Medical Informatics and Decision Making, 8
  • [48] Automated de-identification of free-text medical records
    Neamatullah, Ishna
    Douglass, Margaret M.
    Lehman, Li-wei H.
    Reisner, Andrew
    Villarroel, Mauricio
    Long, William J.
    Szolovits, Peter
    Moody, George B.
    Mark, Roger G.
    Clifford, Gari D.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
  • [49] Natural Language Processing for Assessing Quality Indicators in Free-Text Colonoscopy and Pathology Reports: Development and Usability Study
    Bae, Jung Ho
    Han, Hyun Wook
    Yang, Sun Young
    Song, Gyuseon
    Sa, Soonok
    Chung, Goh Eun
    Seo, Ji Yeon
    Jin, Eun Hyo
    Kim, Heecheon
    An, DongUk
    JMIR MEDICAL INFORMATICS, 2022, 10 (04) : 130 - 141
  • [50] Automated misspelling detection and correction in clinical free-text records
    Lai, Kenneth H.
    Topaz, Maxim
    Goss, Foster R.
    Zhou, Li
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 : 188 - 195