Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity

被引:0
|
作者
Min, Sitao [1 ]
Asif, Hafiz [1 ,3 ]
Wang, Xinyue [2 ]
Vaidya, Jaideep [1 ]
机构
[1] Rutgers State Univ, Newark, NJ 07102 USA
[2] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[3] Hofstra Univ, Hempstead, NY 11549 USA
关键词
Imputation; Data models; Distributed databases; Hospitals; Glucose; Predictive models; Computational modeling; Biological system modeling; Mathematical models; Protocols; Federated learning; missing data imputation; data quality; data heterogeneity;
D O I
10.1109/TKDE.2025.3537403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.
引用
收藏
页码:2266 / 2281
页数:16
相关论文
共 50 条
  • [42] Missing Data Imputation Toolbox for MATLAB
    Folch-Fortuny, Abel
    Arteaga, Francisco
    Ferrer, Alberto
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 154 : 93 - 100
  • [43] Imputation of missing ages in pedigree data
    Balise, Raymond R.
    Chen, Yu
    Dite, Gillian
    Felberg, Anna
    Sun, Limei
    Ziogas, Argyrios
    Whittemore, Alice S.
    HUMAN HEREDITY, 2007, 63 (3-4) : 168 - 174
  • [44] gcimpute: A Package for Missing Data Imputation
    Zhao, Yuxuan
    Udell, Madeleine
    JOURNAL OF STATISTICAL SOFTWARE, 2024, 108 (04): : 1 - 27
  • [45] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [46] Multiple imputation for nonignorable missing data
    Im, Jongho
    Kim, Soeun
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2017, 46 (04) : 583 - 592
  • [47] Imputation of Missing Data in Industrial Databases
    Kamakshi Lakshminarayan
    Steven A. Harp
    Tariq Samad
    Applied Intelligence, 1999, 11 : 259 - 275
  • [48] Evaluating the Impact of Missing Data Imputation
    Pantanowitz, Adam
    Marwala, Tshildzi
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 577 - 586
  • [49] Optimized parameters for missing data imputation
    Zhang, Shichao
    Qin, Yongsong
    Zhu, Xiaofeng
    Zhang, Jilian
    Zhang, Chengqi
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1010 - 1016
  • [50] MISSING DATA, IMPUTATION AND REGRESSION TREES
    Loh, Wei-Yin
    Zhang, Qiong
    Zhang, Wenwen
    Zhou, Peigen
    STATISTICA SINICA, 2020, 30 (04) : 1697 - 1722