Automating classification of free-text electronic health records for epidemiological studies

被引:19
|
作者
Schuemie, Martijn J. [1 ]
Sen, Emine [1 ]
't Jong, Geert W. [1 ]
van Soest, Eva M. [1 ]
Sturkenboom, Miriam C. [1 ]
Kors, Jan A. [1 ]
机构
[1] Erasmus MC, Dept Med Informat, NL-3000 CA Rotterdam, Netherlands
关键词
free text; text mining; case definition; machine learning; method; QUALITY;
D O I
10.1002/pds.3205
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose Increasingly, patient information is stored in electronic medical records, which could be reused for research. Often these records comprise unstructured narrative data, which are cumbersome to analyze. The authors investigated whether text mining can make these data suitable for epidemiological studies and compared a concept recognition approach and a range of machine learning techniques that require a manually annotated training set. The authors show how this training set can be created with minimal effort by using a broad database query. Methods The approaches were tested on two data sets: a publicly available set of English radiology reports for which International Classification of Diseases, Ninth Revision, Clinical Modification code needed to be assigned and a set of Dutch GP records that needed to be classified as either liver disorder cases or noncases. Performance was tested against a manually created gold standard. Results The best overall performance was achieved by a combination of a manually created filter for removing negations and speculations and rule learning algorithms such as RIPPER, with high scores on both the radiology reports (positive predictive value = 0.88, sensitivity = 0.85, specificity = 1.00) and the GP records (positive predictive value = 0.89, sensitivity =0.91, specificity =0.76). Conclusions Although a training set still needs to be created manually, text mining can help reduce the amount of manual work needed to incorporate narrative data in an epidemiological study and will make the data extraction more reproducible. An advantage of machine learning is that it is able to pick up specific language use, such as abbreviations and synonyms used by physicians. Copyright (C) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:651 / 658
页数:8
相关论文
共 50 条
  • [41] Free-Text Notes as a Marker of Needed Improvements in Electronic Prescribing
    Schnipper, Jeffrey L.
    JAMA INTERNAL MEDICINE, 2016, 176 (04) : 471 - 472
  • [42] Interpretable segmentation of medical free-text records based on word embeddings
    Dobrakowski, Adam Gabriel
    Mykowiecka, Agnieszka
    Marciniak, Malgorzata
    Jaworski, Wojciech
    Biecek, Przemyslaw
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2021, 57 (03) : 447 - 465
  • [43] Interpretable segmentation of medical free-text records based on word embeddings
    Adam Gabriel Dobrakowski
    Agnieszka Mykowiecka
    Małgorzata Marciniak
    Wojciech Jaworski
    Przemysław Biecek
    Journal of Intelligent Information Systems, 2021, 57 : 447 - 465
  • [44] Identifying adverse drug reactions from free-text electronic hospital health record notes
    Wasylewicz, Arthur
    van de Burgt, Britt
    Weterings, Aniek
    Jessurun, Naomi
    Korsten, Erik
    Egberts, Toine
    Bouwman, Arthur
    Kerskes, Marieke
    Grouls, Rene
    van der Linden, Carolien
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2022, 88 (03) : 1235 - 1245
  • [45] Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records
    Afzal, Zubair
    Schuemie, Martijn J.
    van Blijderveen, Jan C.
    Sen, Elif F.
    Sturkenboom, Miriam C. J. M.
    Kors, Jan A.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
  • [46] Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records
    Zubair Afzal
    Martijn J Schuemie
    Jan C van Blijderveen
    Elif F Sen
    Miriam CJM Sturkenboom
    Jan A Kors
    BMC Medical Informatics and Decision Making, 13
  • [47] From free-text electronic health records to structured cohorts: Onconum, an innovative methodology for real-world data mining in breast cancer
    Simoulin, Antoine
    Thiebaut, Nicolas
    Neuberger, Karl
    Ibnouhsein, Issam
    Brunel, Nicolas
    Vine, Raphael
    Bousquet, Nicolas
    Latapy, Jules
    Reix, Nathalie
    Moliere, Sebastien
    Lodi, Massimo
    Mathelin, Carole
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 240
  • [48] Classification of cancer stage from free-text histology reports
    McCowan, Iain
    Moore, Darren
    Fry, Mary-Jane
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 922 - +
  • [49] Analysis of free text in electronic health records for identification of cancer patient trajectories
    Jensen, Kasper
    Soguero-Ruiz, Cristina
    Mikalsen, Karl Oyvind
    Lindsetmo, Rolv-Ole
    Kouskoumvekaki, Irene
    Girolami, Mark
    Skrovseth, Stein Olav
    Augestad, Knut Magne
    SCIENTIFIC REPORTS, 2017, 7
  • [50] Analysis of free text in electronic health records for identification of cancer patient trajectories
    Kasper Jensen
    Cristina Soguero-Ruiz
    Karl Oyvind Mikalsen
    Rolv-Ole Lindsetmo
    Irene Kouskoumvekaki
    Mark Girolami
    Stein Olav Skrovseth
    Knut Magne Augestad
    Scientific Reports, 7