DisC2o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data

被引:0
|
作者
Tong, Jiayi [1 ]
Hu, Jie [1 ]
Hripcsak, George [2 ]
Ning, Yang [3 ]
Chen, Yong [1 ]
机构
[1] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
[2] Columbia Univ, Dept Biomed Informat, New York, NY 10027 USA
[3] Cornell Univ, Dept Stat & Data Sci, Ithaca, NY 14853 USA
关键词
Causal Inference; Distribution Shift; Federated Learning; High-dimensional Data; Real-World Data; REGULARIZED CALIBRATED ESTIMATION; ELECTRONIC HEALTH RECORDS; COMMUNICATION-EFFICIENT; CONFIDENCE-REGIONS; PROPENSITY SCORE; TESTS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-dimensional healthcare data, such as electronic health records (EHR) data and claims data, present two primary challenges due to the large number of variables and the need to consolidate data from multiple clinical sites. The third key challenge is the potential existence of heterogeneity in terms of covariate shift. In this paper, we propose a distributed learning algorithm accounting for covariate shift to estimate the average treatment effect (ATE) for high-dimensional data, named DisC2o-HD. Leveraging the surrogate likelihood method, our method calibrates the estimates of the propensity score and outcome models to approximately attain the desired covariate balancing property, while accounting for the covariate shift across multiple clinical sites. We show that our distributed covariate balancing propensity score estimator can approximate the pooled estimator, which is obtained by pooling the data from multiple sites together. The proposed estimator remains consistent if either the propensity score model or the outcome regression model is correctly specified. The semiparametric efficiency bound is achieved when both the propensity score and the outcome models are correctly specified. We conduct simulation studies to demonstrate the performance of the proposed algorithm; additionally, we conduct an empirical study to present the readiness of implementation and validity.
引用
收藏
页数:50
相关论文
共 7 条
  • [1] DisC2o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data
    Tong, Jiayi
    Hu, Jie
    Hripcsak, George
    Ning, Yang
    Chen, Yong
    Journal of Machine Learning Research, 2025, 26
  • [2] Identification of Low-Dimensional Nonlinear Dynamics from High-Dimensional Simulated and Real-World Data
    Paglia, Chiara
    Stiehl, Annika
    Uhl, Christian
    CONTROLO 2022, 2022, 930 : 205 - 213
  • [3] The Optimal Ridge Penalty for Real-world High-dimensional Data Can Be Zero or Negative due to the Implicit Ridge Regularization
    Kobak, Dmitry
    Lomond, Jonathan
    Sanchez, Benoit
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [4] The optimal ridge penalty for real-world high-dimensional data can be zero or negative due to the implicit ridge regularization
    Kobak, Dmitry
    Lomond, Jonathan
    Sanchez, Benoit
    Journal of Machine Learning Research, 2020, 21
  • [5] Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment
    Meurisse, Marjan
    Estupinan-Romero, Francisco
    Gonzalez-Galindo, Javier
    Martinez-Lizaga, Natalia
    Royo-Sierra, Santiago
    Saldner, Simon
    Dolanski-Aghamanoukjan, Lorenz
    Degelsegger-Marquez, Alexander
    Soiland-Reyes, Stian
    Van Goethem, Nina
    Bernal-Delgado, Enrique
    BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)
  • [6] Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment
    Marjan Meurisse
    Francisco Estupiñán-Romero
    Javier González-Galindo
    Natalia Martínez-Lizaga
    Santiago Royo-Sierra
    Simon Saldner
    Lorenz Dolanski-Aghamanoukjan
    Alexander Degelsegger-Marquez
    Stian Soiland-Reyes
    Nina Van Goethem
    Enrique Bernal-Delgado
    BMC Medical Research Methodology, 23
  • [7] Machine learning methods for propensity and disease risk score estimation in high-dimensional data: a plasmode simulation and real-world data cohort analysis
    Guo, Yuchen
    Strauss, Victoria Y.
    Catala, Marti
    Jodicke, Annika M.
    Khalid, Sara
    Prieto-Alhambra, Daniel
    FRONTIERS IN PHARMACOLOGY, 2024, 15