Benchmarking Evaluation Protocols for Classifiers Trained on Differentially Private Synthetic Data

被引:0
|
作者
Movahedi, Parisa [1 ]
Nieminen, Valtteri [1 ,2 ]
Perez, Ileana Montoya [1 ]
Daafane, Hiba [1 ]
Sukhwal, Dishant [1 ]
Pahikkala, Tapio [1 ]
Airola, Antti [1 ]
机构
[1] Turku Univ, Dept Comp, Turku 20014, Finland
[2] Helsinki Univ Hosp HUS, Helsinki 00290, Finland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Protocols; Synthetic data; Data models; Privacy; Analytical models; Machine learning; Bioinformatics; Classification algorithms; Differential privacy; Generative AI; Biomedical data; classification; differential privacy; generative AI; model evaluation; synthetic data;
D O I
10.1109/ACCESS.2024.3446913
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Differentially private (DP) synthetic data has emerged as a potential solution for sharing sensitive individual-level biomedical data. DP generative models offer a promising approach for generating realistic synthetic data that aims to maintain the original data's central statistical properties while ensuring privacy by limiting the risk of disclosing sensitive information about individuals. However, the issue regarding how to assess the expected real-world prediction performance of machine learning models trained on synthetic data remains an open question. In this study, we experimentally evaluate two different model evaluation protocols for classifiers trained on synthetic data. The first protocol employs solely synthetic data for downstream model evaluation, whereas the second protocol assumes limited DP access to a private test set consisting of real data managed by a data curator. We also propose a metric for assessing how well the evaluation results of the proposed protocols match the real-world prediction performance of the models. The assessment measures both the systematic error component indicating how optimistic or pessimistic the protocol is on average and the random error component indicating the variability of the protocol's error. The results of our study suggest that employing the second protocol is advantageous, particularly in biomedical health studies where the precision of the research is of utmost importance. Our comprehensive empirical study offers new insights into the practical feasibility and usefulness of different evaluation protocols for classifiers trained on DP-synthetic data.
引用
收藏
页码:118637 / 118648
页数:12
相关论文
共 50 条
  • [21] Towards Understanding the fairness of differentially private margin classifiers
    Wenqiang Ruan
    Mingxin Xu
    Yinan Jing
    Weili Han
    World Wide Web, 2023, 26 : 1201 - 1221
  • [22] Locally Differentially Private Protocols for Frequency Estimation
    Wang, Tianhao
    Blocki, Jeremiah
    Li, Ninghui
    Jha, Somesh
    PROCEEDINGS OF THE 26TH USENIX SECURITY SYMPOSIUM (USENIX SECURITY '17), 2017, : 729 - 745
  • [23] Differentially Private Release of Synthetic Graphs
    Elias, Marek
    Kapralov, Michael
    Kulkarni, Janardhan
    Lee, Yin Tat
    PROCEEDINGS OF THE 2020 ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2020, : 560 - 578
  • [24] Differentially Private Release of Synthetic Graphs
    Elias, Marek
    Kapralov, Michael
    Kulkarni, Janardhan
    Lee, Yin Tat
    PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 560 - 578
  • [25] Synthetic Data Generation and Evaluation Techniques for Classifiers in Data Starved Medical Applications
    Bae, Wan D.
    Alkobaisi, Shayma
    Horak, Matthew
    Bankar, Siddheshwari
    Bhuvaji, Sartaj
    Kim, Sungroul
    Park, Choon-Sik
    IEEE ACCESS, 2025, 13 : 16584 - 16602
  • [26] Canaries in the data mine: Improving trained classifiers
    Laidler, VG
    White, RL
    STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 453 - 455
  • [27] Differentially private synthetic mixed-type data generation for unsupervised learning
    Tantipongpipat, Uthaipon Tao
    Waites, Chris
    Boob, Digvijay
    Siva, Amaresh Ankit
    Cummings, Rachel
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2021, 15 (04): : 779 - 807
  • [28] Basic Evaluation Scenarios for Incrementally Trained Classifiers
    Szadkowski, Rudolf
    Drchal, Jan
    Faigl, Jan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II, 2019, 11728 : 507 - 517
  • [29] Black-Box Separations for Differentially Private Protocols
    Khurana, Dakshita
    Maji, Hemanta K.
    Sahai, Amit
    ADVANCES IN CRYPTOLOGY - ASIACRYPT 2014, PT II, 2014, 8874 : 386 - 405
  • [30] Differentially Private Auctions for Private Data Crowdsourcing
    Shi, Mingyu
    Qiao, Yu
    Wang, Xinbo
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1 - 8