Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data

被引:2
|
作者
Hartvigsen, Thomas [1 ]
Sen, Cansu [1 ]
Rundensteiner, Elke A. [1 ]
机构
[1] Worcester Polytech Inst, Worcester, MA 01609 USA
关键词
MRSA; Machine learning; Early prediction; Feature fusion; BIG DATA; PREDICTION; DISEASE;
D O I
10.1007/978-3-030-29196-9_21
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Methicillin-resistant Staphylococcus aureus (MRSA), an antibiotic resistant bacteria, is a common cause of one of the more devastating hospital-acquired infections (HAI) in the United States. In this work, we study the practicality of leveraging machine learning methods for early detection of MRSA infections based on a rich variety of patient information commonly available in modern Electronic Health Records (EHR). We explore heterogeneous types of data in EHRs including on-admission demographics, throughout-stay time series and free-form clinical notes. On-admission data capture non-clinical information (e.g., age, marital status) while Throughout-stay data include vital signs, medications, laboratory studies, and other clinical assessments. Clinical notes, free-from text documents created by medical professionals, contain expert observations about patients. Our proposed system generates dense patient-level representations for each data type, extracting features from each of our data types. It then generates scores for each patient, indicating their risk of acquiring MRSA. We evaluate prediction performance achieved by core Machine Learning methods, namely Logistic Regression, Support Vector Machine, and Random Forest, when mining these different types of EHR data retrospectively to detect patterns predictive of MRSA infection. We evaluate classification performance using MIMIC III a critical care data set comprised of 12 years of patient records from the Beth Israel Deaconess Medical Center Intensive Care Unit in Boston, MA. Our experiments show that while all types of data contain predictive signals, the fusion of all sources of data leads to the most effective prediction accuracy.
引用
收藏
页码:399 / 419
页数:21
相关论文
共 50 条
  • [1] Unstructured Data Are Superior to Structured Data for Eliciting Quantitative Smoking History From the Electronic Health Record
    Ruckdeschel, John C.
    Riley, Mark
    Parsatharathy, Sriram
    Chamarthi, Rajesh
    Rajagopal, Chakethraman
    Hsu, Hui Shuang
    Mangold, Doug
    Driscoll, Chiny
    JCO CLINICAL CANCER INFORMATICS, 2023, 7 : e2200155
  • [2] Unstructured Data Are Superior to Structured Data for Eliciting Quantitative Smoking History From the Electronic Health Record
    Ruckdeschel, John C.
    Riley, Mark
    Parsatharathy, Sriram
    Chamarthi, Rajesh
    Rajagopal, Chakethraman
    Hsu, Hui Shuang
    Mangold, Doug
    Driscoll, Chiny
    JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [3] Geographic surveillance of community associated MRSA infections in children using electronic health record data
    Lilly Cheng Immergluck
    Traci Leong
    Khusdeep Malhotra
    Trisha Chan Parker
    Fatima Ali
    Robert C. Jerris
    George S. Rust
    BMC Infectious Diseases, 19
  • [4] Geographic surveillance of community associated MRSA infections in children using electronic health record data
    Immergluck, Lilly Cheng
    Leong, Traci
    Matthews, Kevin
    Malhotra, Khusdeep
    Parker, Trisha Chan
    Ali, Fatima
    Jerris, Robert C.
    Rust, George S.
    BMC INFECTIOUS DISEASES, 2019, 19 (1)
  • [5] Ascertainment of Aspirin Exposure Using Structured and Unstructured Large-scale Electronic Health Record Data
    Bustamante, Ranier
    Earles, Ashley
    Murphy, James D.
    Bryant, Alex K.
    Patterson, Olga V.
    Gawron, Andrew J.
    Kaltenbach, Tonya
    Whooley, Mary A.
    Fisher, Deborah A.
    Saini, Sameer D.
    Gupta, Samir
    Liu, Lin
    MEDICAL CARE, 2019, 57 (10) : E60 - E64
  • [6] Correction to: Geographic surveillance of community associated MRSA infections in children using electronic health record data
    Lilly Cheng Immergluck
    Traci Leong
    Khusdeep Malhotra
    Trisha Chan Parker
    Fatima Ali
    Robert C. Jerris
    George S. Rust
    BMC Infectious Diseases, 19
  • [7] Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data
    Hong, Na
    Wen, Andrew
    Shen, Feichen
    Sohn, Sunghwan
    Wang, Chen
    Liu, Hongfang
    Jiang, Guoqian
    JAMIA OPEN, 2019, 2 (04) : 570 - 579
  • [8] Improving the performance of lung nodule classification by fusing structured and unstructured data
    Tang, Ning
    Zhang, Rui
    Wei, Zeliang
    Chen, Xicheng
    Li, Gaoming
    Song, Qiuyue
    Yi, Dong
    Wu, Yazhou
    INFORMATION FUSION, 2022, 88 : 161 - 174
  • [9] The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification
    Kharrazi, Hadi
    Anzaldi, Laura J.
    Hernandez, Leilani
    Davison, Ashwini
    Boyd, Cynthia M.
    Leff, Bruce
    Kimura, Joe
    Weiner, Jonathan P.
    JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 2018, 66 (08) : 1499 - 1507
  • [10] Predicting Baby Feeding Method from Unstructured Electronic Health Record Data
    Rao, Ashwani
    Maiden, Kristin
    Carterette, Ben
    Ehrenthal, Deb
    PROCEEDINGS OF THE ACM SIXTH INTERNATIONAL WORKSHOP ON DATA AND TEXT MINING IN BIOMEDICAL INFORMATICS, 2012, : 29 - 33