Assessing stroke severity using electronic health record data: a machine learning approach

被引:51
|
作者
Kogan, Emily [1 ]
Twyman, Kathryn [1 ]
Heap, Jesse [1 ]
Milentijevic, Dejan [2 ]
Lin, Jennifer H. [2 ]
Alberts, Mark [3 ]
机构
[1] Janssen Res & Dev LLC, Raritan, NJ 08869 USA
[2] Janssen Sci Affairs LLC, Titusville, NJ USA
[3] Hartford HealthCare, Hartford, CT USA
关键词
Database; Outcomes research; Real-world evidence; ISCHEMIC-STROKE; DOUBLE-BLIND; OUTCOMES; PHASE-3; SCALE; CARE;
D O I
10.1186/s12911-019-1010-x
中图分类号
R-058 [];
学科分类号
摘要
Background Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. Methods NIHSS scores available in the Optum (c) de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Results Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R-2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Conclusions Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Assessing stroke severity using electronic health record data: a machine learning approach
    Emily Kogan
    Kathryn Twyman
    Jesse Heap
    Dejan Milentijevic
    Jennifer H. Lin
    Mark Alberts
    [J]. BMC Medical Informatics and Decision Making, 20
  • [2] Identifying Stroke Patients At Risk For Atrial Fibrillation Using Electronic Health Record Data And Machine Learning
    Su, Tongli
    Hasan, S. M. Shafiul
    Nahab, Fadi B.
    Hu, Xiao
    [J]. STROKE, 2023, 54
  • [3] Identifying Stroke Patients At Risk For Cognitive Impairment And Dementia Using Electronic Health Record Data And Machine Learning
    Hasan, S. M. Shafiul
    Su, Tongli
    Saurman, Jessica
    Nahab, Fadi B.
    Hu, Xiao
    [J]. STROKE, 2023, 54
  • [4] An Electronic Health Record Phenotype of Ischemic Stroke Using Non-Claims Clinical Data and Machine Learning
    Kummer, Benjamin R.
    Luna, Jorge M.
    Esenwa, Charles C.
    Salmasian, Hojjat
    Vawdrey, David K.
    Kamel, Hooman
    Elkind, Mitchell S.
    [J]. STROKE, 2017, 48
  • [5] Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data
    Wong, Jenna
    Murray Horwitz, Mara
    Zhou, Li
    Toh, Sengwee
    [J]. CURRENT EPIDEMIOLOGY REPORTS, 2018, 5 (04) : 331 - 342
  • [6] Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data
    Jenna Wong
    Mara Murray Horwitz
    Li Zhou
    Sengwee Toh
    [J]. Current Epidemiology Reports, 2018, 5 : 331 - 342
  • [7] Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data
    Tedeschi, Sara K.
    Cai, Tianrun
    He, Zeling
    Ahuja, Yuri
    Hong, Chuan
    Yates, Katherine A.
    Dahal, Kumar
    Xu, Chang
    Lyu, Houchen
    Yoshida, Kazuki
    Solomon, Daniel H.
    Cai, Tianxi
    Liao, Katherine P.
    [J]. ARTHRITIS CARE & RESEARCH, 2021, 73 (03) : 442 - 448
  • [8] Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data
    Gianfrancesco, Milena A.
    Tamang, Suzanne
    Yazdany, Jinoos
    Schmajuk, Gabriela
    [J]. JAMA INTERNAL MEDICINE, 2018, 178 (11) : 1544 - 1547
  • [9] Classifying Pseudogout Using Machine Learning Approaches with Electronic Health Record Data
    Tedeschi, Sara K.
    Cai, Tianrun
    He, Zeling
    Ahuja, Yuri
    Hong, Chuan
    Yates, Katherine
    Dahal, Kumar
    Xu, Chang
    Lyu, Houchen
    Yoshida, Kazuki
    Solomon, Daniel
    Cai, Tianxi
    Liao, Katherine
    [J]. ARTHRITIS & RHEUMATOLOGY, 2019, 71
  • [10] Identification of postoperative complications using electronic health record data and machine learning
    Bronsert, Michael
    Singh, Abhinav B.
    Henderson, William G.
    Hammermeister, Karl
    Meguid, Robert A.
    Colborn, Kathryn L.
    [J]. AMERICAN JOURNAL OF SURGERY, 2020, 220 (01): : 114 - 119