Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings

被引:1
|
作者
Guo, Shijie [1 ]
Hu, Jingchen [2 ]
机构
[1] Stanford Univ, Civil & Environm Engn Dept, Stanford, CA 94305 USA
[2] Vassar Coll, Math & Stat Dept, Poughkeepsie, NY 12601 USA
来源
AMERICAN STATISTICIAN | 2023年 / 77卷 / 02期
关键词
Attribute disclosure; Data privacy; Disclosure risk; Identification disclosure; Intruder's knowledge; Synthetic data;
D O I
10.1080/00031305.2022.2077440
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
When releasing record-level data containing sensitive information to the public, the data disseminator is responsible for protecting the privacy of every record in the dataset, simultaneously preserving important features of the data for users' analyses. These goals can be achieved by data synthesis, where confidential data are replaced with synthetic data that are simulated based on statistical models estimated on the confidential data. In this article, we present a data synthesis case study, where synthetic values of price and the number of available days in a sample of the New York Airbnb Open Data are created for privacy protection. One sensitive variable, the number of available days of an Airbnb listing, has a large amount of zero-valued records and also truncated at the two ends. We propose a zero-inflated truncated Poisson regression model for its synthesis. We use a sequential synthesis approach to further synthesize the sensitive price variable. The resulting synthetic data are evaluated for its utility preservation and privacy protection, the latter in the form of disclosure risks. Furthermore, we propose methods to investigate how uncertainties in intruder's knowledge would influence the identification disclosure risks of the synthetic data. In particular, we explore several realistic scenarios of uncertainties in intruder's knowledge of available information and evaluate their impacts on the resulting identification disclosure risks.
引用
收藏
页码:192 / 200
页数:9
相关论文
共 50 条
  • [1] A Data Privacy Preservation Approach and a Case Study in Data Analytics
    Salhi, Abdellah
    [J]. 4TH INNOVATION AND ANALYTICS CONFERENCE & EXHIBITION (IACE 2019), 2019, 2138
  • [2] RISK-EFFICIENT BAYESIAN DATA SYNTHESIS FOR PRIVACY PROTECTION
    Hu, Jingchen
    Savitsky, Terrance D.
    Williams, Matthew R.
    [J]. JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2022, 10 (05) : 1370 - 1399
  • [3] JPEG-based scalable privacy protection and image data utility preservation
    Ruchaud, Natacha
    Dugelay, Jean-Luc
    [J]. IET SIGNAL PROCESSING, 2018, 12 (07) : 881 - 887
  • [4] Utility of Privacy Preservation for Health Data Publishing
    Wu, Lengdong
    He, Hua
    Zaiane, Osmar R.
    [J]. 2013 IEEE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2013, : 510 - 511
  • [5] Privacy preservation of the user data and properly balancing between privacy and utility
    Yuvaraj N.
    Praghash K.
    Karthikeyan T.
    [J]. International Journal of Business Intelligence and Data Mining, 2022, 20 (04): : 394 - 411
  • [6] Privacy Preservation and Analytical Utility of E-Learning Data Mashups in the Web of Data
    Rodriguez-Garcia, Mercedes
    Balderas, Antonio
    Manuel Dodero, Juan
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [7] Privacy Issues and Data Protection in Big Data: A Case Study Analysis under GDPR
    Gruschka, Nils
    Mavroeidis, Vasileios
    Vishi, Kamer
    Jensen, Meiko
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5027 - 5033
  • [8] The impact of privacy protection measures on the utility of crowdsourced cycling data
    Raturi, Varun
    Hong, Jinhyun
    McArthur, David Philip
    Livingston, Mark
    [J]. JOURNAL OF TRANSPORT GEOGRAPHY, 2021, 92
  • [9] Privacy and Utility Preservation for Location Data Using Stay Region Analysis
    Dash, Manoranjan
    Teo, Sin G.
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 808 - 820
  • [10] Granular data representation under privacy protection: Tradeoff between data utility and privacy via information granularity
    Zhang, Ge
    Zhu, Xiubin
    Yin, Li
    Pedrycz, Witold
    Li, Zhiwu
    [J]. APPLIED SOFT COMPUTING, 2022, 131