Synthetic Census Microdata Generation: A Comparative Study of Synthesis Methods Examining the Trade-Off Between Disclosure Risk and Utility

被引:0
|
作者
Little, Claire [1 ]
Allmendinger, Richard [1 ]
Elliot, Mark [1 ]
机构
[1] Univ Manchester, Ctr Digital Trust & Soc, Oxford Rd, Manchester M13 9PL, England
关键词
synthetic data; data utility; disclosure risk; UK SAMPLES; IMPACT;
D O I
10.1177/0282423X241266523
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
There is growing interest in synthetic data generation as a means of allowing access to useful data whilst preserving confidentiality. In particular, synthetic microdata generation could allow increased access to census and administrative data. An accurate understanding of the comparative performance of current synthetic data generators, in terms of the resulting data utility and disclosure risk for synthetic microdata, is important in allowing data owners to make informed decisions about the choice of method and parameter settings to use. Synthesizing microdata can present challenges as the data typically contains predominantly categorical variables that standard statistical methods may struggle to process. In this paper we present the first in-depth evaluation of four state-of-the-art synthetic data generators originating from the statistical (synthpop, DataSynthesizer) and deep learning (CTGAN, TVAE) communities and each capable of dealing with microdata. We use four real census microdatasets (Canada, Fiji, Rwanda, UK) to systematically validate and compare the synthetic data generators and their parameter settings in terms of the utility and disclosure risk of the resulting synthetic data using statistical metrics and the risk-utility map for visualization. Our analysis shows that the performance of the synthetic data generators considered depends on their parameter settings and the dataset.
引用
收藏
页码:255 / 308
页数:54
相关论文
共 21 条
  • [1] A Methodology to Compare Anonymization Methods Regarding Their Risk-Utility Trade-off
    Domingo-Ferrer, Josep
    Ricci, Sara
    Soria-Comas, Jordi
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE (MDAI 2017), 2017, 10571 : 132 - 143
  • [2] On the Trade-Off Between Privacy and Utility in Mobile Services: A Qualitative Study
    Liu, Yang
    Simpson, Andrew
    COMPUTER SECURITY, ESORICS 2019, 2020, 11980 : 261 - 278
  • [3] Trade-off between disclosure risk and information loss using multivariate microaggregation:: A case study on business data
    Sánchez, JA
    Urrutia, J
    Ripoll, E
    PRIVACY IN STATISTICAL DATABASES, PROCEEDINGS, 2004, 3050 : 307 - 322
  • [4] A Survey on Privacy Preserving Synthetic Data Generation and a Discussion on a Privacy-Utility Trade-off Problem
    Ghatak, Debolina
    Sakurai, Kouichi
    SCIENCE OF CYBER SECURITY, SCISEC 2022 WORKSHOPS, 2022, 1680 : 167 - 180
  • [5] BAYESIAN DATA SYNTHESIS AND THE UTILITY-RISK TRADE-OFF FOR MIXED EPIDEMIOLOGICAL DATA
    Feldman, Joseph
    Kowal, Daniel R.
    ANNALS OF APPLIED STATISTICS, 2022, 16 (04): : 2577 - 2602
  • [6] The Trade-Off Between Mandatory and Voluntary Disclosure: Evidence From Oil Companies' Risk Reporting
    Arena, Claudia
    Bozzolan, Saverio
    Imperatore, Claudia
    JOURNAL OF ACCOUNTING AUDITING AND FINANCE, 2023, 38 (04): : 986 - 1008
  • [7] Evaluating the Impact of Face Anonymization Methods on Computer Vision Tasks: A Trade-Off Between Privacy and Utility
    Stenger, Roland
    Busse, Steffen
    Sander, Jonas
    Eisenbarth, Thomas
    Fudickar, Sebastian
    IEEE ACCESS, 2025, 13 : 11070 - 11079
  • [8] Trade-off analysis between global impact potential and local risk: A case study of refrigerants
    Xue, Mianqiang
    Kojima, Naoya
    Zhou, Liang
    Machimura, Takashi
    Tokai, Akihiro
    JOURNAL OF CLEANER PRODUCTION, 2019, 217 : 627 - 632
  • [9] Risk-based trade-off between verification and validation - An industry-motivated study
    Henningsson, K
    Wohlin, C
    PRODUCT FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROCEEDINGS, 2005, 3547 : 443 - 457
  • [10] Trade-off between accuracy and fairness of data-driven building and indoor environment models: A comparative study of pre-processing methods
    Sun, Ying
    Haghighat, Fariborz
    Fung, Benjamin C. M.
    ENERGY, 2022, 239