Algorithmically Effective Differentially Private Synthetic Data

被引:0
|
作者
He, Yiyun [1 ]
Vershynin, Roman [1 ]
Zhu, Yizhe [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92717 USA
关键词
differential privacy; synthetic data; Wasserstein metric;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a highly effective algorithmic approach for generating epsilon-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset X in the hypercube [0, 1](d), our algorithm generates synthetic dataset Y such that the expected 1-Wasserstein distance between the empirical measure of X and Y is O((epsilon n)(-1/d)) for d >= 2, and is O(log(2) (epsilon n)(epsilon n)(-1)) for d = 1. The accuracy guarantee is optimal up to a constant factor for d >= 2, and up to a logarithmic factor for d = 1. Our algorithm has a fast running time of O(epsilon dn) for all d >= 1 and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022c) for d >= 2.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?
    Perez, Ileana Montoya
    Movahedi, Parisa
    Nieminen, Valtteri
    Airola, Antti
    Pahikkala, Tapio
    METHODS OF INFORMATION IN MEDICINE, 2024, 63 (01/02) : 35 - 51
  • [2] Private Sampling: A Noiseless Approach for Generating Differentially Private Synthetic Data
    Boedihardjo, March
    Strohmer, Thomas
    Vershynin, Roman
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (03): : 1082 - 1115
  • [3] Collaborative learning from distributed data with differentially private synthetic data
    Prediger, Lukas
    Jalko, Joonas
    Honkela, Antti
    Kaski, Samuel
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [4] Differentially Private Synthetic Data Using KD-Trees
    Kreacic, Eleonora
    Nouri, Navid
    Potluru, Vamsi K.
    Balch, Tucker
    Veloso, Manuela
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1143 - 1153
  • [5] AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data
    McKenna, Ryan
    Mullins, Brett
    Sheldon, Daniel
    Miklau, Gerome
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2599 - 2612
  • [6] Differentially Private Normalizing Flows for Synthetic Tabular Data Generation
    Lee, Jaewoo
    Kim, Minjung
    Jeong, Yonghyun
    Ro, Youngmin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7345 - 7353
  • [7] Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data
    Raab, Gillian M.
    PRIVACY IN STATISTICAL DATABASES, PSD 2022, 2022, 13463 : 250 - 265
  • [8] Evaluating Classifiers Trained on Differentially Private Synthetic Health Data
    Movahedi, Parisa
    Nieminen, Valtteri
    Perez, Ileana Montoya
    Pahikkala, Tapio
    Airola, Antti
    2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 748 - 753
  • [9] Generating Poisson-distributed differentially private synthetic data
    Quick, Harrison
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2021, 184 (03) : 1093 - 1108
  • [10] Mitigating Statistical Bias within Differentially Private Synthetic Data
    Ghalebikesabi, Sahra
    Wilde, Harrison
    Jewson, Jack
    Doucet, Arnaud
    Vollmer, Sebastian
    Holmes, Chris
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 696 - 705