Differentially Private Data Generation with Missing Data

被引:0
|
作者
Mohapatra, Shubhankar [1 ]
Zong, Jianqiao [1 ]
Kerschbaum, Florian [1 ]
He, Xi [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 08期
关键词
IMPUTATION; MIXTURE; RECORDS;
D O I
10.14778/3659437.3659455
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of DP synthetic data with missing values and propose three effective adaptive strategies that significantly improve the utility of the synthetic data on four real-world datasets with different types and levels of missing data and privacy requirements. We also identify the relationship between privacy impact for the complete ground truth data and incomplete data for these DP synthetic data generation algorithms. We model the missing mechanisms as a sampling process to obtain tighter upper bounds for the privacy guarantees to the ground truth data. Overall, this study contributes to a better understanding of the challenges and opportunities for using private synthetic data generation algorithms in the presence of missing data.
引用
收藏
页码:2022 / 2035
页数:14
相关论文
共 50 条
  • [1] Differentially Private k-Nearest Neighbor Missing Data Imputation
    Clifton, Chris
    Hanson, Eric J.
    Merrill, Keith
    Merrill, Shawn
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2022, 25 (03)
  • [2] Differentially Private Normalizing Flows for Synthetic Tabular Data Generation
    Lee, Jaewoo
    Kim, Minjung
    Jeong, Yonghyun
    Ro, Youngmin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7345 - 7353
  • [3] Differentially Private Auctions for Private Data Crowdsourcing
    Shi, Mingyu
    Qiao, Yu
    Wang, Xinbo
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1 - 8
  • [4] Differentially private synthetic medical data generation using convolutional GANs
    Torfi, Amirsina
    Fox, Edward A.
    Reddy, Chandan K.
    [J]. INFORMATION SCIENCES, 2022, 586 : 485 - 500
  • [5] DP-CGAN : Differentially Private Synthetic Data and Label Generation
    Torkzadehmahani, Reihaneh
    Kairouz, Peter
    Paten, Benedict
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 98 - 104
  • [6] Differentially private data publishing for arbitrarily partitioned data
    Wang, Rong
    Fung, Benjamin C. M.
    Zhu, Yan
    Peng, Qiang
    [J]. INFORMATION SCIENCES, 2021, 553 : 247 - 265
  • [7] Adaptive Differentially Private Data Release for Data Sharing and Data Mining
    Xiong, Li
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 891 - 891
  • [8] Differentially Private Multidimensional Data Publication
    Zhang Ji
    Dong Xin
    Yu Jiadi
    Luo Yuan
    Li Minglu
    Wu Bin
    [J]. CHINA COMMUNICATIONS, 2014, 11 (01) : 79 - 85
  • [9] PrivSyn: Differentially Private Data Synthesis
    Zhang, Zhikun
    Wang, Tianhao
    Li, Ninghui
    Honorio, Jean
    Backes, Michael
    He, Shibo
    Chen, Jiming
    Zhang, Yang
    [J]. PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 929 - 946
  • [10] Differentially Private Topological Data Analysis
    Kang, Taegyu
    Kim, Sehwan
    Sohn, Jinwon
    Awan, Jordan
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25