Automating classification of free-text electronic health records for epidemiological studies

被引:19
|
作者
Schuemie, Martijn J. [1 ]
Sen, Emine [1 ]
't Jong, Geert W. [1 ]
van Soest, Eva M. [1 ]
Sturkenboom, Miriam C. [1 ]
Kors, Jan A. [1 ]
机构
[1] Erasmus MC, Dept Med Informat, NL-3000 CA Rotterdam, Netherlands
关键词
free text; text mining; case definition; machine learning; method; QUALITY;
D O I
10.1002/pds.3205
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose Increasingly, patient information is stored in electronic medical records, which could be reused for research. Often these records comprise unstructured narrative data, which are cumbersome to analyze. The authors investigated whether text mining can make these data suitable for epidemiological studies and compared a concept recognition approach and a range of machine learning techniques that require a manually annotated training set. The authors show how this training set can be created with minimal effort by using a broad database query. Methods The approaches were tested on two data sets: a publicly available set of English radiology reports for which International Classification of Diseases, Ninth Revision, Clinical Modification code needed to be assigned and a set of Dutch GP records that needed to be classified as either liver disorder cases or noncases. Performance was tested against a manually created gold standard. Results The best overall performance was achieved by a combination of a manually created filter for removing negations and speculations and rule learning algorithms such as RIPPER, with high scores on both the radiology reports (positive predictive value = 0.88, sensitivity = 0.85, specificity = 1.00) and the GP records (positive predictive value = 0.89, sensitivity =0.91, specificity =0.76). Conclusions Although a training set still needs to be created manually, text mining can help reduce the amount of manual work needed to incorporate narrative data in an epidemiological study and will make the data extraction more reproducible. An advantage of machine learning is that it is able to pick up specific language use, such as abbreviations and synonyms used by physicians. Copyright (C) 2012 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:651 / 658
页数:8
相关论文
共 50 条
  • [1] Automated Classification of Free Text Electronic Health Records for Epidemiological Studies
    Schuemie, Martijn J.
    Sen, Emine
    van Soest, Eva M.
    Sturkenboom, Miriam C.
    Kors, Jan A.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2010, 19 : S106 - S106
  • [2] Additional Value From Free-Text Diagnoses in Electronic Health Records: Hybrid Dictionary and Machine Learning Classification Study
    Mehra, Tarun
    Wekhof, Tobias
    Keller, Dagmar Iris
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [3] Forms or Free-Text? Measuring Advance Care Planning Activity Using Electronic Health Records
    Zupanc, Sophia N.
    Lakin, Joshua R.
    Volandes, Angelo E.
    Paasche-Orlow, Michael K.
    Moseley, Edward T.
    Gundersen, Daniel A.
    Das, Sophiya
    Penumarthy, Akhila
    Martins-Welch, Diana
    Burns, Edith A.
    Carney, Maria T.
    Itty, Jennifer E.
    Emmert, Kaitlin
    Tulsky, James A.
    Lindvall, Charlotta
    JOURNAL OF PAIN AND SYMPTOM MANAGEMENT, 2023, 66 (05) : E615 - E624
  • [4] NLP STRATEGIES FOR ANALYZING FREE-TEXT PSYCHIATRIC ELECTRONIC HOSPITAL RECORDS
    De la Hoz, Juan
    Loohuis, Loes Olde
    Castano, Mauricio
    Song, Janet
    Service, Susan
    Teshiba, Terri
    Gallego, Cristian
    Sabatti, Chiara
    Escobar, Javier
    Reus, Victor
    Bui, Alex
    Bearden, Carrie E.
    Lopez-Jaramillo, Carlos
    Freimer, Nelson
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S127 - S127
  • [5] Mining free-text medical records
    Heinze, DT
    Morsch, ML
    Holbrook, J
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2001, : 254 - 258
  • [6] Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review
    Koleck, Theresa A.
    Dreisbach, Caitlin
    Bourne, Philip E.
    Bakken, Suzanne
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (04) : 364 - 379
  • [7] Natural Language Processing for Automatic Identification of Major Depressive Disorders in Free-Text Electronic Health Records
    Nunez, Nicolas
    Biernacka, Joanna M.
    Gardea-Resendez, Manuel
    Kshatriya, Bhavani Singh Agnikula
    Ryu, Euijung
    Fu, Sunyang
    Singh, Balwinder
    Coombes, Brandon
    Frye, Mark
    Wang, Yanshan
    BIOLOGICAL PSYCHIATRY, 2021, 89 (09) : S155 - S155
  • [8] Words prediction based on N-gram model for free-text entry in electronic health records
    Azita Yazdani
    Reza Safdari
    Ali Golkar
    Sharareh R. Niakan Kalhori
    Health Information Science and Systems, 7
  • [9] Words prediction based on N-gram model for free-text entry in electronic health records
    Yazdani, Azita
    Safdari, Reza
    Golkar, Ali
    Kalhori, Sharareh R. Niakan
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2019, 7 (1)
  • [10] Forms or Free-Text? Alternative Approaches to Measuring Advance Care Planning Activity Using Electronic Health Records
    Zupanc, Sophia N.
    Lakin, Joshua R.
    Volandes, Angelo
    Paasche-Orlow, Michael
    Tulsky, James
    Lindvall, Charlotta
    JOURNAL OF PAIN AND SYMPTOM MANAGEMENT, 2023, 65 (05) : E595 - E596