The application of unsupervised deep learning in predictive models using electronic health records

被引：14

作者：

Wang, Lei ^{[1
,2
]}

Tong, Liping ^{[3
]}

Davis, Darcy ^{[3
]}

Arnold, Tim ^{[4
]}

Esposito, Tina ^{[3
]}

机构：

[1] Renmin Univ China, Sch Stat, 59 Zhong Guan Cun Ave, Beijing, Peoples R China

[2] Univ Illinois, Dept Math Stat & Comp Sci, 851 S Morgan St, Chicago, IL 60607 USA

[3] Advocate Aurora Hlth, 3075 Highland Pkwy, Downers Grove, IL 60515 USA

[4] Cerner Corp, 2800 Rockcreek Pkwy, North Kansas City, MO 64117 USA

来源：

BMC MEDICAL RESEARCH METHODOLOGY | 2020年 / 20卷 / 01期

关键词：

Autoencoder; LASSO; Enhanced Reg; Predictive model; Predictive performance; Important response-specific predictors; REGRESSION; AUTOENCODER;

D O I：

10.1186/s12874-020-00923-1

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.

引用

页数：9

共 50 条

[1] The application of unsupervised deep learning in predictive models using electronic health records
Lei Wang
Liping Tong
Darcy Davis
Tim Arnold
Tina Esposito
[J]. BMC Medical Research Methodology, 20
[2] Federated learning of predictive models from federated Electronic Health Records
Brisimi, Theodora S.
Chen, Ruidi
Mela, Theofanie
Olshevsky, Alex
Paschalidis, Ioannis Ch.
Shi, Wei
[J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 112 : 59 - 67
[3] Unsupervised probabilistic models for sequential Electronic Health Records
Kaplan, Alan D.
Greene, John D.
Liu, Vincent X.
Ray, Priyadip
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 134
[4] Readmission prediction using deep learning on electronic health records
Ashfaq, Awais
Sant'Anna, Anita
Lingman, Markus
Nowaczyk, Slawomir
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 97
[5] Interpretation Attacks and Defenses on Predictive Models Using Electronic Health Records
Razmi, Fereshteh
Lou, Jian
Hong, Yuan
Xiong, Li
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT III, 2023, 14171 : 446 - 461
[6] Descriptive and Predictive Analytics on Electronic Health Records using Machine Learning
Anandi, V
Ramesh, M.
[J]. 2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
[7] Deep Learning for Electronic Health Records Analytics
Harerimana, Gaspard
Kim, Jong Wook
Yoo, Hoon
Jang, Beakcheol
[J]. IEEE ACCESS, 2019, 7 : 101245 - 101259
[8] A Survey of Deep Learning for Electronic Health Records
Xu, Jiabao
Xi, Xuefeng
Chen, Jie
Sheng, Victor S.
Ma, Jieming
Cui, Zhiming
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):
[9] Facilitating the Development of Deep Learning Models with Visual Analytics for Electronic Health Records
Hur, Cinyoung
Wi, JeongA
Kim, YoungBin
[J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (22) : 1 - 14
[10] Scalable and Interpretable Predictive Models for Electronic Health Records
Fejza, Amela
Geneves, Pierre
Layaida, Nabil
Bosson, Jean-Luc
[J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 341 - 350

← 1 2 3 4 5 →