Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity

被引:0
|
作者
Min, Sitao [1 ]
Asif, Hafiz [1 ,3 ]
Wang, Xinyue [2 ]
Vaidya, Jaideep [1 ]
机构
[1] Rutgers State Univ, Newark, NJ 07102 USA
[2] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[3] Hofstra Univ, Hempstead, NY 11549 USA
关键词
Imputation; Data models; Distributed databases; Hospitals; Glucose; Predictive models; Computational modeling; Biological system modeling; Mathematical models; Protocols; Federated learning; missing data imputation; data quality; data heterogeneity;
D O I
10.1109/TKDE.2025.3537403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.
引用
收藏
页码:2266 / 2281
页数:16
相关论文
共 50 条
  • [1] Testing an improved method for missing data imputation
    Luo, PY
    Succop, PA
    AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON STATISTICS AND THE ENVIRONMENT, 1996, : 81 - 86
  • [2] SICE: an improved missing data imputation technique
    Khan, Shahidul Islam
    Hoque, Abu Sayed Md Latiful
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [3] SICE: an improved missing data imputation technique
    Shahidul Islam Khan
    Abu Sayed Md Latiful Hoque
    Journal of Big Data, 7
  • [4] IMPUTATION OF MISSING DATA
    Lunt, M.
    ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [5] Improved generative adversarial imputation networks for missing data
    Qin, Xiwen
    Shi, Hongyu
    Dong, Xiaogang
    Zhang, Siqi
    Yuan, Liping
    APPLIED INTELLIGENCE, 2024, 54 (21) : 11068 - 11082
  • [6] Missing Data: data replacement and imputation
    Hutcheson, Graeme
    Pampaka, Maria
    JOURNAL OF MODELLING IN MANAGEMENT, 2012, 7 (02)
  • [7] FedTMI: Knowledge aided federated transfer learning for industrial missing data imputation
    Yao, Zoujing
    Zhao, Chunhui
    JOURNAL OF PROCESS CONTROL, 2022, 117 : 206 - 215
  • [8] Improved methods for the imputation of missing data by nearest neighbor methods
    Tutz, Gerhard
    Ramzan, Shahla
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 90 : 84 - 99
  • [9] Improved KNN Imputation for Missing Values in Gene Expression Data
    Keerin, Phimmarin
    Boongoen, Tossapon
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 4009 - 4025
  • [10] Missing Data and Multiple Imputation
    Cummings, Peter
    JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661