Automating classification of free-text electronic health records for epidemiological studies

被引:19
|
作者
Schuemie, Martijn J. [1 ]
Sen, Emine [1 ]
't Jong, Geert W. [1 ]
van Soest, Eva M. [1 ]
Sturkenboom, Miriam C. [1 ]
Kors, Jan A. [1 ]
机构
[1] Erasmus MC, Dept Med Informat, NL-3000 CA Rotterdam, Netherlands
关键词
free text; text mining; case definition; machine learning; method; QUALITY;
D O I
10.1002/pds.3205
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose Increasingly, patient information is stored in electronic medical records, which could be reused for research. Often these records comprise unstructured narrative data, which are cumbersome to analyze. The authors investigated whether text mining can make these data suitable for epidemiological studies and compared a concept recognition approach and a range of machine learning techniques that require a manually annotated training set. The authors show how this training set can be created with minimal effort by using a broad database query. Methods The approaches were tested on two data sets: a publicly available set of English radiology reports for which International Classification of Diseases, Ninth Revision, Clinical Modification code needed to be assigned and a set of Dutch GP records that needed to be classified as either liver disorder cases or noncases. Performance was tested against a manually created gold standard. Results The best overall performance was achieved by a combination of a manually created filter for removing negations and speculations and rule learning algorithms such as RIPPER, with high scores on both the radiology reports (positive predictive value = 0.88, sensitivity = 0.85, specificity = 1.00) and the GP records (positive predictive value = 0.89, sensitivity =0.91, specificity =0.76). Conclusions Although a training set still needs to be created manually, text mining can help reduce the amount of manual work needed to incorporate narrative data in an epidemiological study and will make the data extraction more reproducible. An advantage of machine learning is that it is able to pick up specific language use, such as abbreviations and synonyms used by physicians. Copyright (C) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:651 / 658
页数:8
相关论文
共 50 条
  • [21] Automatic Generation of a Case-Detection algorithm for Hepatobiliary Disease Using Machine Learning on Free-Text Electronic Health Records
    Afzal, Zubair
    Schuemie, Martijn J.
    Sen, Emine
    't Jong, Geert W.
    Sturkenboom, Miriam C.
    Kors, Jan A.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2012, 21 : 194 - 194
  • [22] Human-Algorithm Interaction to Define Variables from Free-Text Notes in Electronic Health Records-Introduction and Examples
    Dore, David D.
    Nunes, Anthony P.
    Yee, Charles
    Walker, Alexander M.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2015, 24 : 46 - 47
  • [23] Automated de-identification of free-text medical records
    Neamatullah, Ishna
    Douglass, Margaret M.
    Lehman, Li-wei H.
    Reisner, Andrew
    Villarroel, Mauricio
    Long, William J.
    Szolovits, Peter
    Moody, George B.
    Mark, Roger G.
    Clifford, Gari D.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
  • [24] Automated de-identification of free-text medical records
    Ishna Neamatullah
    Margaret M Douglass
    Li-wei H Lehman
    Andrew Reisner
    Mauricio Villarroel
    William J Long
    Peter Szolovits
    George B Moody
    Roger G Mark
    Gari D Clifford
    BMC Medical Informatics and Decision Making, 8
  • [25] Automated Misspelling Detection and Correction in Clinical Free-Text Records
    Nazir, Aiman Khan
    Zafar, Iqra
    Fatima, Alia
    Qamar, Usman
    Shaheen, Asma
    Maqbool, Bilal
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 277 - 280
  • [26] Fever detection from free-text clinical records for biosurveillance
    Chapman, WW
    Dowling, JN
    Wagner, MM
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (02) : 120 - 127
  • [27] Automated misspelling detection and correction in clinical free-text records
    Lai, Kenneth H.
    Topaz, Maxim
    Goss, Foster R.
    Zhou, Li
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 : 188 - 195
  • [28] Data Mining from Free-Text Health Records: State of the Art, New Polish Corpus
    Anetta, Kristof
    RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2020), 2020, : 13 - 22
  • [29] Multiple hierarchical classification of free-text clinical guidelines
    Moskovitch, Robert
    Cohen-Kashi, Shiva
    Dror, Uzi
    Levy, Iftah
    Maimon, Amit
    Shahar, Yuval
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2006, 37 (03) : 177 - 190
  • [30] De-identification of primary care electronic medical records free-text data in Ontario, Canada
    Tu, Karen
    Klein-Geltink, Julie
    Mitiku, Tezeta F.
    Mihai, Chiriac
    Martin, Joel
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10