Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model

被引:0
|
作者
Brandon Theodorou
Cao Xiao
Jimeng Sun
机构
[1] University of Illinois at Urbana-Champaign,
[2] Medisyn Inc.,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Synthetic electronic health records (EHRs) that are both realistic and privacy-preserving offer alternatives to real EHRs for machine learning (ML) and statistical analysis. However, generating high-fidelity EHR data in its original, high-dimensional form poses challenges for existing methods. We propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal, high-dimensional EHR, which preserve the statistical properties of real EHRs and can train accurate ML models without privacy concerns. HALO generates a probability density function over medical codes, clinical visits, and patient records, allowing for generating realistic EHR data without requiring variable selection or aggregation. Extensive experiments demonstrated that HALO can generate high-fidelity data with high-dimensional disease code probabilities closely mirroring (above 0.9 R2 correlation) real EHR data. HALO also enhances the accuracy of predictive modeling and enables downstream ML models to attain similar accuracy as models trained on genuine data.
引用
收藏
相关论文
共 50 条
  • [41] Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longitudinal Electronic Health Records
    Li, Yikuan
    Mamouei, Mohammad
    Salimi-Khorshidi, Gholamreza
    Rao, Shishir
    Hassaine, Abdelaali
    Canoy, Dexter
    Lukasiewicz, Thomas
    Rahimi, Kazem
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (02) : 1106 - 1117
  • [42] Uncertainty Quantification in Transcranial Magnetic Stimulation via High-Dimensional Model Representation
    Gomez, Luis J.
    Yuecel, Abdulkadir C.
    Hernandez-Garcia, Luis
    Taylor, Stephan F.
    Michielssen, Eric
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2015, 62 (01) : 361 - 372
  • [43] Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration
    Zhang, Guanlin
    Wu, Yuehua
    Gao, Xin
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 924 - 938
  • [44] Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records
    Abdullah, Sheikh S.
    Rostamzadeh, Neda
    Sedig, Kamran
    Garg, Amit X.
    McArthur, Eric
    [J]. INFORMATICS-BASEL, 2020, 7 (02):
  • [45] Sensitivity computation for uncertain dynamical systems using high-dimensional model representation and hierarchical grids
    Walz, Nico-Philipp
    Burkhardt, Markus
    Hanss, Michael
    Eberhard, Peter
    [J]. DYNAMICAL ANALYSIS OF MULTIBODY SYSTEMS WITH DESIGN UNCERTAINTIES, 2015, 13 : 127 - 137
  • [46] High-Dimensional Bayesian Optimization of Personalized Cardiac Model Parameters via an Embedded Generative Model
    Dhamala, Jwala
    Ghimire, Sandesh
    Sapp, John L.
    Horacek, B. Milan
    Wang, Linwei
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT II, 2018, 11071 : 499 - 507
  • [47] ARTERIAL: A Natural Language Processing Model for Prevention of Information Leakage from Electronic Health Records
    Goldschmidt, Guilherme
    Zeiser, Felipe Andre
    Righi, Rodrigo da Rosa
    da Costa, Cristiano Andre
    [J]. 2023 XIII BRAZILIAN SYMPOSIUM ON COMPUTING SYSTEMS ENGINEERING, SBESC, 2023,
  • [48] Med7: A transferable clinical natural language processing model for electronic health records
    Kormilitzin, Andrey
    Vaci, Nemanja
    Liu, Qiang
    Nevado-Holgado, Alejo
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 118
  • [49] Hierarchical time-varying mixed-effects models in high-dimensional time series and longitudinal data studies
    Li, Jinglan
    Zhang, Zhengjun
    [J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2019, 31 (03) : 695 - 721
  • [50] High-dimensional functional graphical model structure learning via neighborhood selection approach
    Zhao, Boxin
    Zhai, Percy S.
    Wang, Y. Samuel
    Kolar, Mladen
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (01): : 1042 - 1129