The application of unsupervised deep learning in predictive models using electronic health records

被引:14
|
作者
Wang, Lei [1 ,2 ]
Tong, Liping [3 ]
Davis, Darcy [3 ]
Arnold, Tim [4 ]
Esposito, Tina [3 ]
机构
[1] Renmin Univ China, Sch Stat, 59 Zhong Guan Cun Ave, Beijing, Peoples R China
[2] Univ Illinois, Dept Math Stat & Comp Sci, 851 S Morgan St, Chicago, IL 60607 USA
[3] Advocate Aurora Hlth, 3075 Highland Pkwy, Downers Grove, IL 60515 USA
[4] Cerner Corp, 2800 Rockcreek Pkwy, North Kansas City, MO 64117 USA
关键词
Autoencoder; LASSO; Enhanced Reg; Predictive model; Predictive performance; Important response-specific predictors; REGRESSION; AUTOENCODER;
D O I
10.1186/s12874-020-00923-1
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] The application of unsupervised deep learning in predictive models using electronic health records
    Lei Wang
    Liping Tong
    Darcy Davis
    Tim Arnold
    Tina Esposito
    [J]. BMC Medical Research Methodology, 20
  • [2] Federated learning of predictive models from federated Electronic Health Records
    Brisimi, Theodora S.
    Chen, Ruidi
    Mela, Theofanie
    Olshevsky, Alex
    Paschalidis, Ioannis Ch.
    Shi, Wei
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 112 : 59 - 67
  • [3] Unsupervised probabilistic models for sequential Electronic Health Records
    Kaplan, Alan D.
    Greene, John D.
    Liu, Vincent X.
    Ray, Priyadip
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 134
  • [4] Readmission prediction using deep learning on electronic health records
    Ashfaq, Awais
    Sant'Anna, Anita
    Lingman, Markus
    Nowaczyk, Slawomir
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 97
  • [5] Interpretation Attacks and Defenses on Predictive Models Using Electronic Health Records
    Razmi, Fereshteh
    Lou, Jian
    Hong, Yuan
    Xiong, Li
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT III, 2023, 14171 : 446 - 461
  • [6] Descriptive and Predictive Analytics on Electronic Health Records using Machine Learning
    Anandi, V
    Ramesh, M.
    [J]. 2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [7] Deep Learning for Electronic Health Records Analytics
    Harerimana, Gaspard
    Kim, Jong Wook
    Yoo, Hoon
    Jang, Beakcheol
    [J]. IEEE ACCESS, 2019, 7 : 101245 - 101259
  • [8] A Survey of Deep Learning for Electronic Health Records
    Xu, Jiabao
    Xi, Xuefeng
    Chen, Jie
    Sheng, Victor S.
    Ma, Jieming
    Cui, Zhiming
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [9] Facilitating the Development of Deep Learning Models with Visual Analytics for Electronic Health Records
    Hur, Cinyoung
    Wi, JeongA
    Kim, YoungBin
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (22) : 1 - 14
  • [10] Scalable and Interpretable Predictive Models for Electronic Health Records
    Fejza, Amela
    Geneves, Pierre
    Layaida, Nabil
    Bosson, Jean-Luc
    [J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 341 - 350