Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy

被引:9
|
作者
Sun, Chang [1 ,2 ]
van Soest, Johan [3 ,4 ]
Dumontier, Michel [1 ,2 ]
机构
[1] Maastricht Univ, Inst Data Sci, Fac Sci & Engn, Maastricht, Netherlands
[2] Maastricht Univ, Fac Sci & Engn, Dept Adv Comp Sci, Maastricht, Netherlands
[3] Maastricht Univ, Brightlands Inst Smart Soc, Fac Sci & Engn, Heerlen, Netherlands
[4] Maastricht Univ, GROW Sch Oncol & Reprod, Dept Radiat Oncol Maastro, Med Ctr, Maastricht, Netherlands
关键词
Synthetic data; Synthetic health data; Generative adversarial network; Data privacy; Health data sharing;
D O I
10.1016/j.jbi.2023.104404
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A large amount of personal health data that is highly valuable to the scientific community is still not accessible or requires a lengthy request process due to privacy concerns and legal restrictions. As a solution, synthetic data has been studied and proposed to be a promising alternative to this issue. However, generating realistic and privacy-preserving synthetic personal health data retains challenges such as simulating the characteristics of the patients' data that are in the minority classes, capturing the relations among variables in imbalanced data and transferring them to the synthetic data, and preserving individual patients' privacy. In this paper, we propose a differentially private conditional Generative Adversarial Network model (DP-CGANS) consisting of data transformation, sampling, conditioning, and network training to generate realistic and privacy-preserving personal data. Our model distinguishes categorical and continuous variables and transforms them into latent space separately for better training performance. We tackle the unique challenges of generating synthetic patient data due to the special data characteristics of personal health data. For example, patients with a certain disease are typically the minority in the dataset and the relations among variables are crucial to be observed. Our model is structured with a conditional vector as an additional input to present the minority class in the imbalanced data and maximally capture the dependency between variables. Moreover, we inject statistical noise into the gradients in the networking training process of DP-CGANS to provide a differential privacy guarantee. We extensively evaluate our model with state-of-the-art generative models on personal socio-economic datasets and real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement. We demonstrate that our model outperforms other comparable models, especially in capturing the dependence between variables. Finally, we present the balance between data utility and privacy in synthetic data generation considering the different data structures and characteristics of real-world personal health data such as imbalanced classes, abnormal distributions, and data sparsity.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Effective data generation for imbalanced learning using conditional generative adversarial networks
    Douzas, Georgios
    Bacao, Fernando
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 464 - 471
  • [32] Reconstruction of irregular missing seismic data using conditional generative adversarial networks
    Wei, Qing
    Li, Xiangyang
    Song, Mingpeng
    [J]. GEOPHYSICS, 2021, 86 (06) : V471 - V488
  • [33] Generation of False Data Injection Attacks using Conditional Generative Adversarial Networks
    Mohammadpourfard, Mostafa
    Ghanaatpishe, Fateme
    Mohammadi, Marziyeh
    Lakshminarayana, Subhash
    Pechenizkiy, Mykola
    [J]. 2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE 2020): SMART GRIDS: KEY ENABLERS OF A GREEN POWER SYSTEM, 2020, : 41 - 45
  • [34] Conditional Independence Testing using Generative Adversarial Networks
    Bellot, Alexis
    van der Schaar, Mihaela
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [35] Clustering Using Conditional Generative Adversarial Networks (cGANs)
    Ruzicka, Marek
    Dopiriak, Matus
    [J]. 2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
  • [36] Phase Retrieval Using Conditional Generative Adversarial Networks
    Uelwer, Tobias
    Oberstrass, Alexander
    Harmeling, Stefan
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 731 - 738
  • [37] Ultrasonic imaging using conditional generative adversarial networks
    Molinier, Nathan
    Painchaud-April, Guillaume
    Le Duff, Alain
    Toews, Matthew
    Belanger, Pierre
    [J]. ULTRASONICS, 2023, 133
  • [38] Seismic Data Augmentation Based on Conditional Generative Adversarial Networks
    Li, Yuanming
    Ku, Bonhwa
    Zhang, Shou
    Ahn, Jae-Kwang
    Ko, Hanseok
    [J]. SENSORS, 2020, 20 (23) : 1 - 13
  • [39] Generating Synthetic Fermentation Data of Shindari, a Traditional Jeju Beverage, Using Multiple Imputation Ensemble and Generative Adversarial Networks
    Hazra, Debapriya
    Byun, Yung-Cheol
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [40] imdpGAN: Generating Private and Specific Data with Generative Adversarial Networks
    Gupta, Saurabh
    Buduru, Arun Balaji
    Kumaraguru, Ponnurangam
    [J]. 2020 SECOND IEEE INTERNATIONAL CONFERENCE ON TRUST, PRIVACY AND SECURITY IN INTELLIGENT SYSTEMS AND APPLICATIONS (TPS-ISA 2020), 2020, : 64 - 72