Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity

被引:0
|
作者
Min, Sitao [1 ]
Asif, Hafiz [1 ,3 ]
Wang, Xinyue [2 ]
Vaidya, Jaideep [1 ]
机构
[1] Rutgers State Univ, Newark, NJ 07102 USA
[2] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[3] Hofstra Univ, Hempstead, NY 11549 USA
关键词
Imputation; Data models; Distributed databases; Hospitals; Glucose; Predictive models; Computational modeling; Biological system modeling; Mathematical models; Protocols; Federated learning; missing data imputation; data quality; data heterogeneity;
D O I
10.1109/TKDE.2025.3537403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.
引用
收藏
页码:2266 / 2281
页数:16
相关论文
共 50 条
  • [21] Multiple imputation of missing data
    Lydersen, Stian
    TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2022, 142 (02) : 151 - 151
  • [22] Influence of Data Distribution in Missing Data Imputation
    Santos, Miriam Seoane
    Soares, Jastin Pompeu
    Abreu, Pedro Henriques
    Araujo, Helder
    Santos, Joao
    ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2017, 2017, 10259 : 285 - 294
  • [23] Data variability in the imputation quality of missing data
    Stochero, Elisandra Lucia Moro
    Lucio, Alessandro Dal'Col
    Jacobi, Luciane Flores
    ACTA SCIENTIARUM-AGRONOMY, 2024, 46
  • [24] Federated conditional generative adversarial nets imputation method for air quality missing data
    Zhou, Xu
    Liu, Xiaofeng
    Lan, Gongjin
    Wu, Jian
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [25] Missing Data Imputation for Multivariate Time series in Industrial IoT: A Federated Learning Approach
    Gkillas, Alexandros
    Lalos, Aris S.
    2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 87 - 94
  • [26] Improved Imputation of Missing Pavement Performance Data Using Auxiliary Variables
    Farhan, J.
    Fwa, T. F.
    JOURNAL OF TRANSPORTATION ENGINEERING, 2015, 141 (01)
  • [27] Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
    Betancourt, Clara
    Li, Cathy W. Y.
    Kleinert, Felix
    Schultz, Martin G.
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2023, 57 (46) : 18246 - 18258
  • [28] Missing phenotype data imputation in pedigree data analysis
    Fridley, B
    de Andrade, M
    GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 249 - 249
  • [29] Missing Data Imputation with High-Dimensional Data
    Brini, Alberto
    van den Heuvel, Edwin R.
    AMERICAN STATISTICIAN, 2024, 78 (02): : 240 - 252
  • [30] Missing phenotype data imputation in pedigree data analysis
    Fridley, Brooke L.
    de Andrade, Mariza
    GENETIC EPIDEMIOLOGY, 2008, 32 (01) : 52 - 60