Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity

被引:0
|
作者
Min, Sitao [1 ]
Asif, Hafiz [1 ,3 ]
Wang, Xinyue [2 ]
Vaidya, Jaideep [1 ]
机构
[1] Rutgers State Univ, Newark, NJ 07102 USA
[2] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[3] Hofstra Univ, Hempstead, NY 11549 USA
关键词
Imputation; Data models; Distributed databases; Hospitals; Glucose; Predictive models; Computational modeling; Biological system modeling; Mathematical models; Protocols; Federated learning; missing data imputation; data quality; data heterogeneity;
D O I
10.1109/TKDE.2025.3537403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.
引用
收藏
页码:2266 / 2281
页数:16
相关论文
共 50 条
  • [31] Missing data imputation in multivariate data by evolutionary algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1468 - 1474
  • [32] Exploring the Effects of Data Distribution in Missing Data Imputation
    Soares, Jastin Pompeu
    Santos, Miriam Seoane
    Abreu, Pedro Henriques
    Araujo, Helder
    Santos, Joao
    ADVANCES IN INTELLIGENT DATA ANALYSIS XVII, IDA 2018, 2018, 11191 : 251 - 263
  • [33] Cafe Data 2.0: New Data From a New and Improved Cafe
    DePaolo, Concetta A.
    Robinson, David F.
    Jacobs, Aimee
    JOURNAL OF STATISTICS EDUCATION, 2016, 24 (02): : 85 - 103
  • [34] Multiple imputation of missing data for survey data analysis
    Lupo, Coralie
    Le Bouquin, Sophie
    Michel, Virginie
    Colin, Pierre
    Chauvin, Claire
    EPIDEMIOLOGIE ET SANTE ANIMALE, 2008, NO 53, 2008, (53): : 73 - 83
  • [35] Multiple Imputation For Missing Ordinal Data
    Chen, Ling
    Toma-Drane, Mariana
    Valois, Robert F.
    Drane, J. Wanzer
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2005, 4 (01) : 288 - 299
  • [36] A Probabilistic Approach for Missing Data Imputation
    Arefin, Muhammed Nazmul
    Masum, Abdul Kadar Muhammad
    COMPLEXITY, 2024, 2024
  • [37] Quantum Circuit for Imputation of Missing Data
    Sanavio, Claudio
    Tibaldi, Simone
    Tignone, Edoardo
    Ercolessi, Elisa
    IEEE TRANSACTIONS ON QUANTUM ENGINEERING, 2024, 5
  • [38] MULTIPLE IMPUTATION AS A MISSING DATA MACHINE
    BRAND, J
    VANBUUREN, S
    VANMULLIGEN, EM
    TIMMERS, T
    GELSEMA, E
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, : 303 - 306
  • [39] Multiple imputation with missing data indicators
    Beesley, Lauren J.
    Bondarenko, Irina
    Elliot, Michael R.
    Kurian, Allison W.
    Katz, Steven J.
    Taylor, Jeremy M. G.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (12) : 2685 - 2700
  • [40] Missing Data Imputation for Supervised Learning
    Poulos, Jason
    Valle, Rafael
    APPLIED ARTIFICIAL INTELLIGENCE, 2018, 32 (02) : 186 - 196