Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity

被引：0

作者：

Min, Sitao ^{[1
]}

Asif, Hafiz ^{[1
,3
]}

Wang, Xinyue ^{[2
]}

Vaidya, Jaideep ^{[1
]}

机构：

[1] Rutgers State Univ, Newark, NJ 07102 USA

[2] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China

[3] Hofstra Univ, Hempstead, NY 11549 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2025年 / 37卷 / 05期

关键词：

Imputation; Data models; Distributed databases; Hospitals; Glucose; Predictive models; Computational modeling; Biological system modeling; Mathematical models; Protocols; Federated learning; missing data imputation; data quality; data heterogeneity;

D O I：

10.1109/TKDE.2025.3537403

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.

引用

页码：2266 / 2281

页数：16

共 50 条

[41] MISSING DATA, IMPUTATION, AND THE BOOTSTRAP - COMMENT
RUBIN, DB
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (426) : 475 - 478
[42] Missing Data Imputation Toolbox for MATLAB
Folch-Fortuny, Abel
Arteaga, Francisco
Ferrer, Alberto
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 154 : 93 - 100
[43] Imputation of missing ages in pedigree data
Balise, Raymond R.
Chen, Yu
Dite, Gillian
Felberg, Anna
Sun, Limei
Ziogas, Argyrios
Whittemore, Alice S.
HUMAN HEREDITY, 2007, 63 (3-4) : 168 - 174
[44] gcimpute: A Package for Missing Data Imputation
Zhao, Yuxuan
Udell, Madeleine
JOURNAL OF STATISTICAL SOFTWARE, 2024, 108 (04): : 1 - 27
[45] Multiple imputation: dealing with missing data
de Goeij, Moniek C. M.
van Diepen, Merel
Jager, Kitty J.
Tripepi, Giovanni
Zoccali, Carmine
Dekker, Friedo W.
NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
[46] Multiple imputation for nonignorable missing data
Im, Jongho
Kim, Soeun
JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2017, 46 (04) : 583 - 592
[47] Imputation of Missing Data in Industrial Databases
Kamakshi Lakshminarayan
Steven A. Harp
Tariq Samad
Applied Intelligence, 1999, 11 : 259 - 275
[48] Evaluating the Impact of Missing Data Imputation
Pantanowitz, Adam
Marwala, Tshildzi
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 577 - 586
[49] Optimized parameters for missing data imputation
Zhang, Shichao
Qin, Yongsong
Zhu, Xiaofeng
Zhang, Jilian
Zhang, Chengqi
PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1010 - 1016
[50] MISSING DATA, IMPUTATION AND REGRESSION TREES
Loh, Wei-Yin
Zhang, Qiong
Zhang, Wenwen
Zhou, Peigen
STATISTICA SINICA, 2020, 30 (04) : 1697 - 1722

← 1 2 3 4 5 →