Algorithmically Effective Differentially Private Synthetic Data

被引:0
|
作者
He, Yiyun [1 ]
Vershynin, Roman [1 ]
Zhu, Yizhe [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92717 USA
关键词
differential privacy; synthetic data; Wasserstein metric;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a highly effective algorithmic approach for generating epsilon-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset X in the hypercube [0, 1](d), our algorithm generates synthetic dataset Y such that the expected 1-Wasserstein distance between the empirical measure of X and Y is O((epsilon n)(-1/d)) for d >= 2, and is O(log(2) (epsilon n)(epsilon n)(-1)) for d = 1. The accuracy guarantee is optimal up to a constant factor for d >= 2, and up to a logarithmic factor for d = 1. Our algorithm has a fast running time of O(epsilon dn) for all d >= 1 and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022c) for d >= 2.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Differentially Private Grids for Geospatial Data
    Qardaji, Wahbeh
    Yang, Weining
    Li, Ninghui
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 757 - 768
  • [32] Differentially private multidimensional data publishing
    Khalil Al-Hussaeni
    Benjamin C. M. Fung
    Farkhund Iqbal
    Junqiang Liu
    Patrick C. K. Hung
    Knowledge and Information Systems, 2018, 56 : 717 - 752
  • [33] Differentially Private Methods for Compositional Data
    Guo, Qi
    Barrientos, Andres F.
    Pena, Victor
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
  • [34] Differentially Private Algorithms for Synthetic Power System Datasets
    Dvorkin, Vladimir
    Botterud, Audun
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 2053 - 2058
  • [35] Differentially private low-dimensional synthetic data from high-dimensional datasets
    He, Yiyun
    Strohmer, Thomas
    Vershynin, Roman
    Zhu, Yizhe
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2025, 14 (01)
  • [36] Examining the Utility of Differentially Private Synthetic Data Generated using Variational Autoencoder with TensorFlow Privacy
    Tai, Bo-Chen
    Li, Szu-Chuang
    Huang, Yennun
    Wang, Pang-Chieh
    2022 IEEE 27TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC), 2022, : 236 - 241
  • [37] Distributed Synthetic Time-Series Data Generation With Local Differentially Private Federated Learning
    Jiang, Xue
    Zhou, Xuebing
    Grossklags, Jens
    IEEE ACCESS, 2024, 12 : 157067 - 157082
  • [38] Differentially private data publishing for arbitrarily partitioned data
    Wang, Rong
    Fung, Benjamin C. M.
    Zhu, Yan
    Peng, Qiang
    INFORMATION SCIENCES, 2021, 553 : 247 - 265
  • [39] Turbo: Effective Caching in Differentially-Private Databases
    Kostopoulou, Kelly
    Tholoniat, Pierre
    Cidon, Asaf
    Geambasu, Roxana
    Lecuyer, Mathias
    PROCEEDINGS OF THE TWENTY-NINTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2023, 2023, : 579 - +
  • [40] Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data
    Pereira, Mayana
    Kshirsagar, Meghana
    Mukherjee, Sumit
    Dodhia, Rahul
    Lavista Ferres, Juan
    de Sousa, Rafael
    PLOS ONE, 2024, 19 (02):