Data privacy preserving scheme using generalised linear models

被引:4
|
作者
Lee, Min Cherng [1 ]
Mitra, Robin [2 ]
Lazaridis, Emmanuel [3 ]
Lai, An-Chow [1 ]
Goh, Yong Kheng [1 ]
Yap, Wun-She [1 ]
机构
[1] Univ Tunku Abdul Rahman, Lee Kong Chian Fac Engn & Sci, Kampar, Perak, Malaysia
[2] Univ Southampton, Southampton Stat Sci Res Inst, Southampton, Hants, England
[3] UCL, Natl Inst Cardiovasc Outcomes Res, London, England
关键词
Disclosure control; Data privacy; Multiple imputation; Generalised linear models; Synthetic data;
D O I
10.1016/j.cose.2016.12.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When releasing data for public use, statistical agencies seek to reduce the risk of disclosure, while preserving the utility of the release data. Commonly used approaches (such as adding random noises, top coding variables and swapping data values) will distort the relationships in the original data. To preserve the utility and reduce the risk of disclosure for the released data, we consider the synthetic data approach in this paper where we release multiply imputed partially synthetic data sets comprising original data values, and with values at high disclosure risk being replaced by synthetic values. To generate such synthetic data, we introduce a new variant of factored regression model proposed by Lee and Mitra in 2016. In addition, we take a step forward to propose a new algorithm in identifying the original data that need to be replaced with synthetic data. More importantly, the algorithm that can identify the original data with high disclosure risk can be applied on other existing statistical disclosure control schemes. By using our proposed scheme, data privacy can be preserved since it is difficult to identify the individual under the scenario that the released synthetic data are not entirely similar with the original data. Besides, valid inference about the data can be made using simple combining rules, which take the uncertainty due to the presence of synthetic values. To evaluate the performance of our proposed scheme in terms of the risk of disclosure and the utility of the released synthetic data, we conduct an experiment on a data set taken from 1987 National Indonesia Contraceptive Prevalence. The results justify the applicability of our proposed data privacy preserving scheme in reducing the risk of disclosure while preserving the utility of the released data. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:142 / 154
页数:13
相关论文
共 50 条
  • [1] Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models
    Lee, Min Cherng
    Mitra, Robin
    Lazaridis, Emmanuel
    Lai, An Chow
    Goh, Yong Kheng
    Yap, Wun-She
    [J]. INFORMATION SECURITY AND PRIVACY, PT I, 2016, 9722 : 77 - 93
  • [2] Data Privacy Preserving Scheme in MANETs
    Bhati, Bhawani Shanker
    Venkataram, Pallapa
    [J]. 2014 WORLD CONGRESS ON INTERNET SECURITY (WORLDCIS), 2014, : 22 - 23
  • [3] A scheme for privacy-preserving data dissemination
    Lilien, Leszek
    Bharuava, Bharat
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2006, 36 (03): : 502 - 506
  • [4] An IoT data sharing privacy preserving scheme
    Sun, Yan
    Yin, Lihua
    Sun, Zhe
    Tian, Zhihong
    Du, Xiaojiang
    [J]. IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2020, : 984 - 990
  • [5] Privacy Preserving Data Access Scheme for IoT Devices
    Jahan, Mosarrat
    Seneviratne, Suranga
    Chu, Ben
    Seneviratne, Aruna
    Jha, Sanjay
    [J]. 2017 IEEE 16TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2017, : 217 - 226
  • [6] A Privacy-Preserving Health Data Aggregation Scheme
    Liu, Yining
    Liu, Gao
    Cheng, Chi
    Xia, Zhe
    Shen, Jian
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2016, 10 (08): : 3852 - 3864
  • [7] Truthful and privacy-preserving generalized linear models
    Qiu, Yuan
    Liu, Jinyan
    Wang, Di
    [J]. INFORMATION AND COMPUTATION, 2024, 301
  • [8] Privacy-preserving Deep-learning Models for Fingerprint Data Using Differential Privacy
    Mohammadi, Maryam
    Sabry, Farida
    Labda, Wadha
    Malluhi, Qutaibah
    [J]. PROCEEDINGS OF THE 9TH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS, IWSPA 2023, 2023, : 45 - 53
  • [9] On privacy preserving data release of linear dynamic networks
    Lu, Yang
    Zhu, Minghui
    [J]. AUTOMATICA, 2020, 115
  • [10] Privacy Preserving and Efficient Data Collection Scheme for AMI Networks Using Deep Learning
    Ibrahem, Mohamed I.
    Mahmoud, Mohamed
    Fouda, Mostafa M.
    Alsolami, Fawaz
    Alasmary, Waleed
    Shen, Xuemin
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (23) : 17131 - 17146