Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data

被引:0
|
作者
Buckley, Brian [1 ]
O'Hagan, Adrian [1 ,2 ]
Galligan, Marie [3 ]
机构
[1] Univ Coll Dublin, Sch Math & Stat, Dublin, Ireland
[2] Univ Coll Dublin, Insight Ctr Data Analyt, Dublin, Ireland
[3] Univ Coll Dublin, Sch Med, Dublin, Ireland
关键词
variational Bayes; latent class analysis; patient phenotyping; real-world evidence; electronic health records;
D O I
10.3389/fams.2024.1302825
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Introduction Bayesian approaches to patient phenotyping in clinical observational studies have been limited by the computational challenges associated with applying the Markov Chain Monte Carlo (MCMC) approach to real-world data. Approximate Bayesian inference via optimization of the variational evidence lower bound, variational Bayes (VB), has been successfully demonstrated for other applications.Methods We investigate the performance and characteristics of currently available VB and MCMC software to explore the practicability of available approaches and provide guidance for clinical practitioners. Two case studies are used to fully explore the methods covering a variety of real-world data. First, we use the publicly available Pima Indian diabetes data to comprehensively compare VB implementations of logistic regression. Second, a large real-world data set, Optum (TM) EHR with approximately one million diabetes patients extended the analysis to large, highly unbalanced data containing discrete and continuous variables. A Bayesian patient phenotyping composite model incorporating latent class analysis (LCA) and regression was implemented with the second case study.Results We find that several data characteristics common in clinical data, such as sparsity, significantly affect the posterior accuracy of automatic VB methods compared with conditionally conjugate mean-field methods. We find that for both models, automatic VB approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to stopping time compared with closed-form VB methods.Discussion Our results indicate that the patient phenotyping composite Bayes model is more easily usable for real-world studies if Monte Carlo is replaced with VB. It can potentially become a uniquely useful tool for decision support, especially for rare diseases where gold-standard biomarker data are sparse but prior knowledge can be used to assist model diagnosis and may suggest when biomarker tests are warranted.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Bayesian latent class approach for EHR-based phenotyping
    Hubbard, Rebecca A.
    Huang, Jing
    Harton, Joanna
    Oganisian, Arman
    Choi, Grace
    Utidjian, Levon
    Eneli, Ihuoma
    Bailey, L. Charles
    Chen, Yong
    STATISTICS IN MEDICINE, 2019, 38 (01) : 74 - 87
  • [2] ASSESSING THE USE OF VARIATIONAL BAYES FOR LARGE REAL-WORLD DATA
    Buckley, B.
    O'Hagan, A.
    Galligan, M.
    VALUE IN HEALTH, 2022, 25 (12) : S472 - S472
  • [3] Local Data Quality Assessments on EHR-Based Real-World Data for Rare Diseases
    Tahar, Kais
    Verbuecheln, Raphael
    Martin, Tamara
    Graessner, Holm
    Krefting, Dagmar
    CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 292 - 296
  • [4] Metabolic decompensation events in the propionic acidemia population within a large EHR-based real-world data source
    Banerjee, Geetanjoli
    Shen, John
    Vaghela, Shailja
    Patel, Deven
    Zhen, Thomas
    Madsen, Ann
    Sikirica, Vanja
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 : 544 - 545
  • [5] A Bayesian latent class approach for EHR-based phenotyping (vol 38, pg 74, 2019)
    Hubbard, Rebecca A.
    STATISTICS IN MEDICINE, 2020, 39 (02) : 205 - 205
  • [6] Clinical characteristics of the propionic acidemia population identified within a large EHR-based real-world data source (2015-2022)
    Shen, John
    Banerjee, Geetanjoli
    Vaghela, Shailja
    Patel, Deven
    Zhen, Thomas
    Madsen, Ann
    Sikirica, Vanja
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 : 552 - 552
  • [7] Using latent class analysis to inform the design of an EHR-based national chronic disease surveillance model
    Nasuti, Laura
    Andrews, Bonnie
    Li, Wenjun
    Wiltz, Jennifer
    Hohman, Katherine H.
    Patanian, Miriam
    CHRONIC ILLNESS, 2023, 19 (03) : 675 - 680
  • [8] The Challenges of Using Temporal Representation in Real-World EHR Data
    Hsieh, Kang Lin
    2018 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2018, : 453 - 454
  • [9] Phenotyping asthmatic outpatients by cluster analysis in a real-world setting
    Bertolini, Francesca
    Ciprandi, Giorgio
    Gallo, Fabio
    Riccardi, Elisa
    Carriero, Vitina
    Ricciardolo, Fabio Luigi Massimo
    EUROPEAN RESPIRATORY JOURNAL, 2021, 58
  • [10] PAIN AND MOBILITY OF OLDER ADULTS IN MEDICAL AND SURGICAL UNITS: AN EHR-BASED DATA ANALYSIS
    Snigurska, Urszula
    Ser, Sarah
    Prosperi, Mattia
    Bjarnadottir, Ragnhildur
    Lucero, Robert
    Manini, Todd
    INNOVATION IN AGING, 2024, 8 : 1263 - 1263