Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals

被引:3
|
作者
Dashtban, Ashkan [1 ]
Mizani, Mehrdad A. [1 ,2 ]
Pasea, Laura [1 ]
Denaxas, Spiros [1 ]
Corbett, Richard [3 ]
Mamza, Jil B.
Gao, He
Morris, Tamsin [4 ]
Hemingway, Harry [1 ,5 ]
Banerjee, Amitava [1 ,6 ,7 ]
机构
[1] UCL, Inst Hlth Informat, 222 Euston Rd, London NW1 2DA, England
[2] British Heart Fdn Data Sci Ctr, Hlth Data Res UK, London, England
[3] Imperial Coll Healthcare NHS Trust, London, England
[4] AstraZeneca, Med & Sci Affairs, BioPharmaceut Med, London, England
[5] UCL, Hlth Data Res UK, London, England
[6] Barts Hlth NHS Trust, London, England
[7] Univ Coll London Hosp NHS Trust, London, England
来源
EBIOMEDICINE | 2023年 / 89卷
关键词
PREDICTION MODELS; RISK PREDICTION; DEATH;
D O I
10.1016/j.ebiom.2023.104489
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions.Methods We analysed individuals >= 18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter).Findings After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late -onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). Medications: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD.Interpretation In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Identifying subtypes of type 2 diabetes mellitus with machine learning: development, internal validation, prognostic validation and medication burden in linked electronic health records in 420 448 individuals
    Mizani, Mehrdad A.
    Dashtban, Ashkan
    Pasea, Laura
    Zeng, Qingjia
    Khunti, Kamlesh
    Valabhji, Jonathan
    Mamza, Jil Billy
    Gao, He
    Morris, Tamsin
    Banerjee, Amitava
    [J]. BMJ OPEN DIABETES RESEARCH & CARE, 2024, 12 (03)
  • [2] Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
    Rahimian, Fatemeh
    Salimi-Khorshidi, Gholamreza
    Payberah, Amir H.
    Tran, Jenny
    Solares, Roberto Ayala
    Raimondi, Francesca
    Nazarzadeh, Milad
    Canoy, Dexter
    Rahimi, Kazem
    [J]. PLOS MEDICINE, 2018, 15 (11)
  • [3] Development and validation of a pancreatic cancer prediction model from electronic health records using machine learning
    Appelbaum, Limor
    Cambronero, Jose Pablo
    Pollick, Karla
    Silva, George
    Stevens, Jennifer P.
    Mamon, Harvey J.
    Kaplan, Irving D.
    Rinard, Martin
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2020, 38 (04)
  • [4] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul A.
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicholas B.
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
  • [5] Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
    Nonie Alexander
    Daniel C. Alexander
    Frederik Barkhof
    Spiros Denaxas
    [J]. BMC Medical Informatics and Decision Making, 21
  • [6] Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning
    Alexander, Nonie
    Alexander, Daniel C.
    Barkhof, Frederik
    Denaxas, Spiros
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
  • [7] Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study
    Banerjee, Amitava
    Dashtban, Ashkan
    Chen, Suliang
    Pasea, Laura
    Thygesen, Johan H.
    Fatemifar, Ghazaleh
    Tyl, Benoit
    Dyszynski, Tomasz
    Asselbergs, Folkert W.
    Lund, Lars H.
    Lumbers, Tom
    Denaxas, Spiros
    Hemingway, Harry
    [J]. LANCET DIGITAL HEALTH, 2023, 5 (06): : E370 - E379
  • [8] Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms
    Jorge, April
    Castro, Victor M.
    Barnado, April
    Gainer, Vivian
    Hong, Chuan
    Cai, Tianxi
    Cai, Tianrun
    Carroll, Robert
    Denny, Joshua C.
    Crofford, Leslie
    Costenbader, Karen H.
    Liao, Katherine P.
    Karlson, Elizabeth W.
    Feldman, Candace H.
    [J]. SEMINARS IN ARTHRITIS AND RHEUMATISM, 2019, 49 (01) : 84 - 90
  • [9] Development and validation of models for detection of postoperative infections using structured electronic health records data and machine learning
    Colborn, Kathryn L.
    Zhuang, Yaxu
    Dyas, Adam R.
    Henderson, William G.
    Madsen, Helen J.
    Bronsert, Michael R.
    Matheny, Michael E.
    Lambert-Kerzner, Anne
    Myers, Quintin W. O.
    Meguid, Robert A.
    [J]. SURGERY, 2023, 173 (02) : 464 - 471
  • [10] Identifying Lupus Patients in Electronic Health Records: Development and Validation of Machine Learning Algorithms and Application of Rule-Based Algorithms
    Jorge, April
    Castro, Victor M.
    Barnado, April
    Gainer, Vivian
    Hong, Chuan
    Cai, Tianxi
    Carroll, Robert
    Crofford, Leslie
    Costenbader, Karen
    Liao, Katherine P.
    Karlson, Elizabeth
    Feldman, Candace H.
    [J]. ARTHRITIS & RHEUMATOLOGY, 2018, 70