Benchmarking Evaluation Protocols for Classifiers Trained on Differentially Private Synthetic Data

被引:0
|
作者
Movahedi, Parisa [1 ]
Nieminen, Valtteri [1 ,2 ]
Perez, Ileana Montoya [1 ]
Daafane, Hiba [1 ]
Sukhwal, Dishant [1 ]
Pahikkala, Tapio [1 ]
Airola, Antti [1 ]
机构
[1] Turku Univ, Dept Comp, Turku 20014, Finland
[2] Helsinki Univ Hosp HUS, Helsinki 00290, Finland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Protocols; Synthetic data; Data models; Privacy; Analytical models; Machine learning; Bioinformatics; Classification algorithms; Differential privacy; Generative AI; Biomedical data; classification; differential privacy; generative AI; model evaluation; synthetic data;
D O I
10.1109/ACCESS.2024.3446913
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Differentially private (DP) synthetic data has emerged as a potential solution for sharing sensitive individual-level biomedical data. DP generative models offer a promising approach for generating realistic synthetic data that aims to maintain the original data's central statistical properties while ensuring privacy by limiting the risk of disclosing sensitive information about individuals. However, the issue regarding how to assess the expected real-world prediction performance of machine learning models trained on synthetic data remains an open question. In this study, we experimentally evaluate two different model evaluation protocols for classifiers trained on synthetic data. The first protocol employs solely synthetic data for downstream model evaluation, whereas the second protocol assumes limited DP access to a private test set consisting of real data managed by a data curator. We also propose a metric for assessing how well the evaluation results of the proposed protocols match the real-world prediction performance of the models. The assessment measures both the systematic error component indicating how optimistic or pessimistic the protocol is on average and the random error component indicating the variability of the protocol's error. The results of our study suggest that employing the second protocol is advantageous, particularly in biomedical health studies where the precision of the research is of utmost importance. Our comprehensive empirical study offers new insights into the practical feasibility and usefulness of different evaluation protocols for classifiers trained on DP-synthetic data.
引用
收藏
页码:118637 / 118648
页数:12
相关论文
共 50 条
  • [31] Differentially Private Policy Evaluation
    Balle, Borja
    Gomrokchi, Maziar
    Precup, Doina
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [32] POSTER: A Unified Framework of Differentially Private Synthetic Data Release with Generative Adversarial Network
    Lu, Pei-Hsuan
    Yu, Chia-Mu
    CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, : 2547 - 2549
  • [33] Direct data access protocols benchmarking on DPM
    Furano, Fabrizio
    Devresse, Adrien
    Keeble, Oliver
    Mancinelli, Valentina
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [34] Differentially Private Data Generation with Missing Data
    Mohapatra, Shubhankar
    Zong, Jianqiao
    Kerschbaum, Florian
    He, Xi
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (08): : 2022 - 2035
  • [35] Efficient Noise Generation Protocols for Differentially Private Multiparty Computation
    Eriguchi, Reo
    Ichikawa, Atsunori
    Kunihiro, Noboru
    Nuida, Koji
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (06) : 4486 - 4501
  • [36] Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
    Maddouri, Omar
    Qian, Xiaoning
    Alexander, Francis J.
    Dougherty, Edward R.
    Yoon, Byung-Jun
    DATA IN BRIEF, 2022, 42
  • [37] Learning and Evaluating a Differentially Private Pre-trained Language Model
    Hoory, Shlomo
    Feder, Amir
    Tendler, Avichai
    Cohen, Alon
    Erell, Sofia
    Laish, Itay
    Nakhost, Hootan
    Stemmer, Uri
    Benjamini, Ayelet
    Hassidim, Avinatan
    Matias, Yossi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1178 - 1189
  • [38] PrivSyn: Differentially Private Data Synthesis
    Zhang, Zhikun
    Wang, Tianhao
    Li, Ninghui
    Honorio, Jean
    Backes, Michael
    He, Shibo
    Chen, Jiming
    Zhang, Yang
    PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 929 - 946
  • [39] Differentially Private Topological Data Analysis
    Kang, Taegyu
    Kim, Sehwan
    Sohn, Jinwon
    Awan, Jordan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [40] Differentially Private Multidimensional Data Publication
    Zhang Ji
    Dong Xin
    Yu Jiadi
    Luo Yuan
    Li Minglu
    Wu Bin
    CHINA COMMUNICATIONS, 2014, 11 (01) : 79 - 85