Improving Synthetic Data Generation Through Federated Learning in Scarce and Heterogeneous Data Scenarios

被引:0
|
作者
Apellaniz, Patricia A. [1 ]
Parras, Juan [1 ]
Zazo, Santiago [1 ]
机构
[1] Univ Politecn Madrid, Informat Proc & Telecommun Ctr, ETS Ingn Telecomunicac, Madrid 28040, Spain
基金
欧盟地平线“2020”;
关键词
synthetic data generation; federated learning; medical data; data scarcity; data heterogeneity; OBESITY;
D O I
10.3390/bdcc9020018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthetic Data Generation (SDG) is a promising solution for healthcare, offering the potential to generate synthetic patient data closely resembling real-world data while preserving privacy. However, data scarcity and heterogeneity, particularly in under-resourced regions, challenge the effective implementation of SDG. This paper addresses these challenges using Federated Learning (FL) for SDG, focusing on sharing synthetic patients across nodes. By leveraging collective knowledge and diverse data distributions, we hypothesize that sharing synthetic data can significantly enhance the quality and representativeness of generated data, particularly for institutions with limited or biased datasets. This approach aligns with meta-learning concepts, like Domain Randomized Search. We compare two FL techniques, FedAvg and Synthetic Data Sharing (SDS), the latter being our proposed contribution. Both approaches are evaluated using variational autoencoders with Bayesian Gaussian mixture models across diverse medical datasets. Our results demonstrate that while both methods improve SDG, SDS consistently outperforms FedAvg, producing higher-quality, more representative synthetic data. Non-IID scenarios reveal that while FedAvg achieves improvements of 13-27% in reducing divergence compared to isolated training, SDS achieves reductions exceeding 50% in the worst-performing nodes. These findings underscore synthetic data sharing potential to reduce disparities between data-rich and data-poor institutions, fostering more equitable healthcare research and innovation.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Improving Data-Scarce Image Classification Through Multimodal Synthetic Data Pretraining
    Brander, Carl
    Cioflan, Cristian
    Niculescu, Vlad
    Mueller, Hanna
    Polonelli, Tommaso
    Magno, Michele
    Benini, Luca
    2023 IEEE SENSORS APPLICATIONS SYMPOSIUM, SAS, 2023,
  • [2] Incremental federated learning for traffic flow classification in heterogeneous data scenarios
    Pekar, Adrian
    Makara, Laszlo Arpad
    Biczok, Gergely
    Neural Computing and Applications, 2024, 36 (32) : 20401 - 20424
  • [3] Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data
    Zhou, Tailin
    Lin, Zehong
    Zhang, Jun
    Tsang, Danny H. K.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 12131 - 12145
  • [4] Fair Federated Learning for Heterogeneous Data
    Kanaparthy, Samhita
    Padala, Manisha
    Damle, Sankarshan
    Gujar, Sujit
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 298 - 299
  • [5] Federated synthetic data generation with differential privacy
    Xin, Bangzhou
    Geng, Yangyang
    Hu, Teng
    Chen, Sheng
    Yang, Wei
    Wang, Shaowei
    Huang, Liusheng
    NEUROCOMPUTING, 2022, 468 : 1 - 10
  • [6] Synthetic Data for Anonymization in Secure Data Spaces for Federated Learning
    Angulo, Cecilio
    Raya, Cristobal
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2022, 356 : 91 - 94
  • [7] Federated learning with incremental clustering for heterogeneous data
    Espinoza Castellon, Fabiola
    Mayoue, Aurelien
    Sublemontier, Jacques-Henri
    Gouy-Pailler, Cedric
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [8] Differentially Private Federated Learning on Heterogeneous Data
    Noble, Maxence
    Bellet, Aurelien
    Dieuleveut, Aymeric
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [9] Federated learning with superquantile aggregation for heterogeneous data
    Krishna Pillutla
    Yassine Laguel
    Jérôme Malick
    Zaid Harchaoui
    Machine Learning, 2024, 113 : 2955 - 3022
  • [10] Federated learning with superquantile aggregation for heterogeneous data
    Pillutla, Krishna
    Laguel, Yassine
    Malick, Jerome
    Harchaoui, Zaid
    MACHINE LEARNING, 2024, 113 (05) : 2955 - 3022