Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors

被引:118
|
作者
Razavian, Narges [1 ]
Blecker, Saul [2 ]
Schmidt, Ann Marie [3 ]
Smith-McLallen, Aaron [4 ,5 ]
Nigam, Somesh [4 ,5 ]
Sontag, David [1 ]
机构
[1] NYU, Dept Comp Sci, New York, NY 10003 USA
[2] NYU, NYU Langone Med Ctr, Dept Populat Hlth, New York, NY 10003 USA
[3] NYU, Dept Biochem & Mol Pharmacol, Dept Med Pathol, Dept Med,NYU Langone Med Ctr, New York, NY 10003 USA
[4] NYU, NYU Langone Med Ctr, Diabet Res Program, New York, NY 10003 USA
[5] Independence Blue Cross, Adv Analyt, Philadelphia, PA USA
关键词
big data analytics; data mining; machine learning; predictive analytics; risk assessment; disease prediction; longitudinal study; LIFE-STYLE; CARDIOVASCULAR-DISEASE; INSULIN-RESISTANCE; PREVENTION; MELLITUS; MODELS; ADULTS; TOOL; ANALYTICS; METFORMIN;
D O I
10.1089/big.2015.0020
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present a new approach to population health, in which data-driven predictive models are learned for outcomes such as type 2 diabetes. Our approach enables risk assessment from readily available electronic claims data on large populations, without additional screening cost. Proposed model uncovers early and late-stage risk factors. Using administrative claims, pharmacy records, healthcare utilization, and laboratory results of 4.1 million individuals between 2005 and 2009, an initial set of 42,000 variables were derived that together describe the full health status and history of every individual. Machine learning was then used to methodically enhance predictive variable set and fit models predicting onset of type 2 diabetes in 2009-2011, 2010-2012, and 2011-2013. We compared the enhanced model with a parsimonious model consisting of known diabetes risk factors in a real-world environment, where missing values are common and prevalent. Furthermore, we analyzed novel and known risk factors emerging from the model at different age groups at different stages before the onset. Parsimonious model using 21 classic diabetes risk factors resulted in area under ROC curve (AUC) of 0.75 for diabetes prediction within a 2-year window following the baseline. The enhanced model increased the AUC to 0.80, with about 900 variables selected as predictive (p<0.0001 for differences between AUCs). Similar improvements were observed for models predicting diabetes onset 1-3 years and 2-4 years after baseline. The enhanced model improved positive predictive value by at least 50% and identified novel surrogate risk factors for type 2 diabetes, such as chronic liver disease (odds ratio [OR] 3.71), high alanine aminotransferase (OR 2.26), esophageal reflux (OR 1.85), and history of acute bronchitis (OR 1.45). Liver risk factors emerge later in the process of diabetes development compared with obesity-related factors such as hypertension and high hemoglobin A1c. In conclusion, population-level risk prediction for type 2 diabetes using readily available administrative data is feasible and has better prediction performance than classical diabetes risk prediction algorithms on very large populations with missing data. The new model enables intervention allocation at national scale quickly and accurately and recovers potentially novel risk factors at different stages before the disease onset.
引用
收藏
页码:277 / 287
页数:11
相关论文
共 50 条
  • [1] Population-level Prediction of Type 2 Diabetes from Insurance Claims and Analysis of Risk Factors
    Razavian, Narges
    Smith-Mclallen, Aaron
    Nigam, Somesh
    Blecker, Saul
    Schmidt, Ann Marie
    Sontag, David
    [J]. DIABETES, 2015, 64 : A41 - A41
  • [2] Population-Level Interventions Targeting Risk Factors for Hypertension and Diabetes in Rwanda: A Situational Analysis
    Nganabashaka, Jean Pierre
    Ntawuyirushintege, Seleman
    Niyibizi, Jean Berchmans
    Umwali, Ghislaine
    Bavuma, Charlotte M. M.
    Byiringiro, Jean Claude
    Rulisa, Stephen
    Burns, Jacob
    Rehfuess, Eva
    Young, Taryn
    Tumusiime, David K. K.
    [J]. FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [3] Population-Level Approaches to Preventing Type 2 Diabetes Globally
    Siegel, Karen R.
    Albright, Ann L.
    [J]. ENDOCRINOLOGY AND METABOLISM CLINICS OF NORTH AMERICA, 2021, 50 (03) : 401 - 414
  • [4] Type 2 Diabetes Remission-Analysis of Three Population-Level Historical Cohorts
    Tangelloju, Srikanth
    Vu, Giang
    Chavis-Blakely, Hunter
    Little, Bert B.
    [J]. DIABETES, 2018, 67
  • [5] Population-level risk factors, population health, and health policy
    Naumova, Elena N.
    Cohen, Steven A.
    [J]. JOURNAL OF PUBLIC HEALTH POLICY, 2008, 29 (03) : 290 - 298
  • [6] Estimating the Prevalence and Incidence of Type 2 Diabetes Mellitus Using Population Level Pharmacy Claims Data
    Sinnott, Sarah-Jo
    McHugh, Sheena
    Whelton, Helen
    Layte, Richard
    Barron, Steve
    Kearney, Patricia M.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 74 - 74
  • [7] Microvascular disease and risk of cardiovascular events among individuals with type 2 diabetes: a population-level cohort study
    Brownrigg, Jack R. W.
    Hughes, Cian O.
    Burleigh, David
    Karthikesalingam, Alan
    Patterson, Benjamin O.
    Holt, Peter J.
    Thompson, Matthew M.
    de Lusignan, Simon
    Ray, Kausik K.
    Hinchliffe, Robert J.
    [J]. LANCET DIABETES & ENDOCRINOLOGY, 2016, 4 (07): : 588 - 597
  • [8] The Risk Factors Potentially Influencing Hospital Admission in People with Diabetes, Following SARS-CoV-2 Infection: A Population-Level Analysis
    Adrian H. Heald
    David A. Jenkins
    Richard Williams
    Matthew Sperrin
    Helene Fachim
    Rajshekhar N. Mudaliar
    Akheel Syed
    Asma Naseem
    J. Martin Gibson
    Kelly A. Bowden Davies
    Niels Peek
    Simon G. Anderson
    Yonghong Peng
    William Ollier
    [J]. Diabetes Therapy, 2022, 13 : 1007 - 1021
  • [9] The Risk Factors Potentially Influencing Hospital Admission in People with Diabetes, Following SARS-CoV-2 Infection: A Population-Level Analysis
    Heald, Adrian H.
    Jenkins, David A.
    Williams, Richard
    Sperrin, Matthew
    Fachim, Helene
    Mudaliar, Rajshekhar N.
    Syed, Akheel
    Naseem, Asma
    Gibson, J. Martin
    Davies, Kelly A. Bowden
    Peek, Niels
    Anderson, Simon G.
    Peng, Yonghong
    Ollier, William
    [J]. DIABETES THERAPY, 2022, 13 (05) : 1007 - 1021
  • [10] Commentary: Population-level Risk Factors, Population Health, and Health Policy
    Elena N Naumova
    Steven A Cohen
    [J]. Journal of Public Health Policy, 2008, 29 : 290 - 298