Prediction of early childhood obesity with machine learning and electronic health record data

被引:24
|
作者
Pang, Xueqin [1 ]
Forrest, Christopher B. [2 ,3 ]
Le-Scherban, Felice [4 ,5 ]
Masino, Aaron J. [1 ,3 ]
机构
[1] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, 2716 South St, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Appl Clin Res Ctr, Philadelphia, PA 19104 USA
[3] Univ Penn, Perelman Sch Med, Dept Anesthesiol & Crit Care Med, Philadelphia, PA 19104 USA
[4] Drexel Univ, Dornsife Sch Publ Hlth, Dept Epidemiol & Biostat, Philadelphia, PA USA
[5] Drexel Univ, Drexel Urban Hlth Collaborat, Philadelphia, PA USA
关键词
Data quality control; Early childhood obesity; Electronic health record; Machine learning; Prediction; RACIAL/ETHNIC DISPARITIES; MULTIVARIATE DATA; ENVIRONMENT; OVERWEIGHT; ADULTHOOD; WEIGHT; VALUES; NHANES; CARE;
D O I
10.1016/j.ijmedinf.2021.104454
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: This study compares seven machine learning models developed to predict childhood obesity from age > 2 to <= 7 years using Electronic Healthcare Record (EHR) data up to age 2 years. Materials and methods: EHR data from of 860,510 patients with 11,194,579 healthcare encounters were obtained from the Children's Hospital of Philadelphia. After applying stringent quality control to remove implausible growth values and including only individuals with all recommended wellness visits by age 7 years, 27,203 (50.78 % male) patients remained for model development. Seven machine learning models were developed to predict obesity incidence as defined by the Centers for Disease Control and Prevention (age/sex adjusted BMI>95th percentile). Model performance was evaluated by multiple standard classifier metrics and the differences among seven models were compared using the Cochran's Q test and post-hoc pairwise testing. Results: XGBoost yielded 0.81 (0.001) AUC, which outperformed all other models. It also achieved statistically significant better performance than all other models on standard classifier metrics (sensitivity fixed at 80 %): precision 30.90 % (0.22 %), F1-socre 44.60 % (0.26 %), accuracy 66.14 % (0.41 %), and specificity 63.27 % (0.41 %). Discussion and conclusion: Early childhood obesity prediction models were developed from the largest cohort reported to date. Relative to prior research, our models generalize to include males and females in a single model and extend the time frame for obesity incidence prediction to 7 years of age. The presented machine learning model development workflow can be adapted to various EHR-based studies and may be valuable for developing other clinical prediction models.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Assessing stroke severity using electronic health record data: a machine learning approach
    Emily Kogan
    Kathryn Twyman
    Jesse Heap
    Dejan Milentijevic
    Jennifer H. Lin
    Mark Alberts
    [J]. BMC Medical Informatics and Decision Making, 20
  • [32] Machine Learning Prognostic Models for Gastrointestinal Bleeding Using Electronic Health Record Data
    Shung, Dennis
    Laine, Loren
    [J]. AMERICAN JOURNAL OF GASTROENTEROLOGY, 2020, 115 (08): : 1199 - 1200
  • [33] Machine learning applied to electronic health record data in home healthcare: A scoping review
    Hobensack, Mollie
    Song, Jiyoun
    Scharp, Danielle
    Bowles, Kathryn H.
    Topaz, Maxim
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 170
  • [34] Assessing stroke severity using electronic health record data: a machine learning approach
    Kogan, Emily
    Twyman, Kathryn
    Heap, Jesse
    Milentijevic, Dejan
    Lin, Jennifer H.
    Alberts, Mark
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [35] Machine Learning-Based Prediction Model of Preterm Birth Using Electronic Health Record
    Sun, Qi
    Zou, Xiaoxuan
    Yan, Yousheng
    Zhang, Hongguang
    Wang, Shuo
    Gao, Yongmei
    Liu, Haiyan
    Liu, Shuyu
    Lu, Jianbo
    Yang, Ying
    Ma, Xu
    [J]. JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [36] Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data
    Maiga, Jaouja
    Hungilo, Gilbert Gutabaga
    Pranowo
    [J]. 2019 INTERNATIONAL CONFERENCE ON INFORMATICS, MULTIMEDIA, CYBER AND INFORMATION SYSTEM (ICIMCIS), 2019, : 45 - 48
  • [37] Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data
    Simon, Steven T.
    Mandair, Divneet
    Tiwari, Premanand
    Rosenberg, Michael A.
    [J]. JOURNAL OF CARDIOVASCULAR PHARMACOLOGY AND THERAPEUTICS, 2021, 26 (04) : 335 - 340
  • [38] Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment
    Mishra, Anup Kumar
    Chong, Bradford
    Arunachalam, Shivaram P.
    Oberg, Ann L.
    Majumder, Shounak
    [J]. AMERICAN JOURNAL OF GASTROENTEROLOGY, 2024, 119 (08): : 1466 - 1482
  • [39] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Zhao, Juan
    Feng, QiPing
    Wu, Patrick
    Lupu, Roxana A.
    Wilke, Russell A.
    Wells, Quinn S.
    Denny, Joshua C.
    Wei, Wei-Qi
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [40] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Juan Zhao
    QiPing Feng
    Patrick Wu
    Roxana A. Lupu
    Russell A. Wilke
    Quinn S. Wells
    Joshua C. Denny
    Wei-Qi Wei
    [J]. Scientific Reports, 9