Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records

被引:56
|
作者
Pikoula, Maria [1 ,2 ]
Quint, Jennifer Kathleen [2 ,3 ,4 ]
Nissen, Francis [2 ,4 ]
Hemingway, Harry [1 ,2 ]
Smeeth, Liam [2 ,4 ]
Denaxas, Spiros [1 ,2 ]
机构
[1] UCL, Inst Hlth Informat, 222 Euston Rd, London NW1 2DA, England
[2] UCL, Hlth Data Res UK London, 222 Euston Rd, London NW1 2DA, England
[3] Imperial Coll London, Natl Heart & Lung Inst, Resp Epidemiol, Occupat Med & Publ Hlth, London, England
[4] Sch Hyg & Trop Med, EHR Res Grp, London, England
基金
英国医学研究理事会; 英国经济与社会研究理事会; 英国工程与自然科学研究理事会; 英国惠康基金;
关键词
COPD epidemiology; COPD exacerbations; Electronic health records; Cluster analysis; OBSTRUCTIVE PULMONARY-DISEASE; CLUSTER-ANALYSIS; SUBGROUPS;
D O I
10.1186/s12911-019-0805-0
中图分类号
R-058 [];
学科分类号
摘要
BackgroundCOPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health records.MethodsWe applied two unsupervised learning algorithms (k-means and hierarchical clustering) in 30,961 current and former smokers diagnosed with COPD, using linked national structured electronic health records in England available through the CALIBER resource. We used 15 clinical features, including risk factors and comorbidities and performed dimensionality reduction using multiple correspondence analysis. We compared the association between cluster membership and COPD exacerbations and respiratory and cardiovascular death with 10,736 deaths recorded over 146,466 person-years of follow-up. We also implemented and tested a process to assign unseen patients into clusters using a decision tree classifier.ResultsWe identified and characterized five COPD patient clusters with distinct patient characteristics with respect to demographics, comorbidities, risk of death and exacerbations. The four subgroups were associated with 1) anxiety/depression; 2) severe airflow obstruction and frailty; 3) cardiovascular disease and diabetes and 4) obesity/atopy. A fifth cluster was associated with low prevalence of most comorbid conditions.ConclusionsCOPD patients can be sub-classified into groups with differing risk factors, comorbidities, and prognosis, based on data included in their primary care records. The identified clusters confirm findings of previous clustering studies and draw attention to anxiety and depression as important drivers of the disease in young, female patients.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
    Maria Pikoula
    Jennifer Kathleen Quint
    Francis Nissen
    Harry Hemingway
    Liam Smeeth
    Spiros Denaxas
    [J]. BMC Medical Informatics and Decision Making, 19
  • [2] Patterns of polypharmacy before diagnosis of dementia: a data-driven, retrospective, population-based study with primary care electronic health records
    Longo, Elisabetta
    Huo, Lin
    Burnett, Bruce
    Demmler, Joanne
    Morris, Andrew
    Brophy, Sinead
    Lyons, Ronan A.
    Zhou, Shang-Ming
    [J]. LANCET, 2019, 394 : 67 - 67
  • [3] Identifying data-driven subtypes of major depressive disorder with electronic health records
    Sharma, Abhishek
    Verhaak, Pilar F.
    McCoy, Thomas H.
    Perlis, Roy H.
    Doshi-Velez, Finale
    [J]. JOURNAL OF AFFECTIVE DISORDERS, 2024, 356 : 64 - 70
  • [4] Data-driven modeling of clinical pathways using electronic health records
    Funkner, Anastasia A.
    Yakovlev, Aleksey N.
    Kovalchuk, Sergey V.
    [J]. CENTERIS 2017 - INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS / PROJMAN 2017 - INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT / HCIST 2017 - INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERI, 2017, 121 : 835 - 842
  • [5] Data-Driven Population Health Shapes a New Model of Primary Care
    Burroughs, Jon
    Smith, Ron
    [J]. JOURNAL OF HEALTHCARE MANAGEMENT, 2021, 66 (01) : 9 - 13
  • [6] Identifying cancer sub-types from genomic scale data sets using confidence based integration (CBI)
    Sreekumar, R.
    Khursheed, Farida
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
  • [7] Estimating the population health burden of musculoskeletal conditions using primary care electronic health records
    Yu, Dahai
    Peat, George
    Jordan, Kelvin P.
    Bailey, James
    Prieto-Alhambra, Daniel
    Robinson, Danielle E.
    Strauss, Victoria Y.
    Walker-Bone, Karen
    Silman, Alan
    Mamas, Mamas
    Blackburn, Steven
    Dent, Stephen
    Dunn, Kate
    Judge, Andrew
    Protheroe, Joanne
    Wilkie, Ross
    [J]. RHEUMATOLOGY, 2021, 60 (10) : 4832 - 4843
  • [8] Using data-driven approaches to improve delivery of animal health care interventions for public health
    Mazeri, Stella
    Bailey, Jordana L. Burdon
    Mayer, Dagmar
    Chikungwa, Patrick
    Chulu, Julius
    Grossman, Paul Orion
    Lohr, Frederic
    Gibson, Andrew D.
    Handel, Ian G.
    Bronsvoort, Barend M. deC
    Gamble, Luke
    Mellanby, Richard J.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2021, 118 (05)
  • [9] Data-Driven Approaches to Care Delivery: Actionable Informatics in the DoD's Primary Care Behavioral Health Program
    Kincaid, Melissa W.
    Peters, Zachary J.
    Curry, Justin C.
    [J]. FAMILIES SYSTEMS & HEALTH, 2021, 39 (01) : 66 - 76
  • [10] Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative
    Zang, Chengxi
    Zhang, Yongkang
    Xu, Jie
    Bian, Jiang
    Morozyuk, Dmitry
    Schenck, Edward J.
    Khullar, Dhruv
    Nordvig, Anna S.
    Shenkman, Elizabeth A.
    Rothman, Russell L.
    Block, Jason P.
    Lyman, Kristin
    Weiner, Mark G.
    Carton, Thomas W.
    Wang, Fei
    Kaushal, Rainu
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)