Multi-Dimensional Randomized Response

被引:6
|
作者
Domingo-Ferrer, Josep [1 ]
Soria-Comas, Jordi [2 ]
机构
[1] Univ Rovira & Virgili, CYBERCAT Ctr Cybersecur Res Catalonia, Dept Comp Engn & Math, UNESCO Chair Data Privacy, Av Paisos Catalans 26, Tarragona 43007, Catalonia, Spain
[2] Catalan Data Protect Author, Barcelona 08008, Catalonia, Spain
关键词
Privacy; Estimation; Differential privacy; Data privacy; Phase change random access memory; Clustering algorithms; Protocols; Privacy preserving data publishing; randomized response; curse of dimensionality; local anonymization; multivariate data; differential privacy;
D O I
10.1109/TKDE.2020.3045759
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In our data world, a host of not necessarily trusted controllers gather data on individual subjects. To preserve her privacy and, more generally, her informational self-determination, the individual has to be empowered by giving her agency on her own data. Maximum agency is afforded by local anonymization, that allows each individual to anonymize her own data before handing them to the data controller. Randomized response (RR) is a local anonymization approach able to yield multi-dimensional full sets of anonymized microdata that are valid for exploratory analysis and machine learning. This is so because an unbiased estimate of the distribution of the true data of individuals can be obtained from their pooled randomized data. Furthermore, RR offers rigorous privacy guarantees. The main weakness of RR is the curse of dimensionality when applied to several attributes: as the number of attributes grows, the accuracy of the estimated true data distribution quickly degrades. We propose several complementary approaches to mitigate the dimensionality problem. First, we present two basic protocols, separate RR on each attribute and joint RR for all attributes, and discuss their limitations. Then we introduce an algorithm to form clusters of attributes so that attributes in different clusters can be viewed as independent and joint RR can be performed within each cluster. After that, we introduce an adjustment algorithm for the randomized data set that repairs some of the accuracy loss due to assuming independence between attributes when using RR separately on each attribute or due to assuming independence between clusters in cluster-wise RR. We also present empirical work to illustrate the proposed methods.
引用
下载
收藏
页码:4933 / 4946
页数:14
相关论文
共 50 条
  • [1] Multi-Dimensional Randomized Response
    Domingo-Ferrer, Josep
    Soria-Comas, Jordi
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1517 - 1518
  • [2] Multi-dimensional evaluation of response to salt stress in wheat
    Dadshani, Said
    Sharma, Ram C.
    Baum, Michael
    Ogbonnaya, Francis Chuks
    Leon, Jens
    Ballvora, Agim
    PLOS ONE, 2019, 14 (09):
  • [3] Multi-dimensional thermomechanical model for pseudoelastic response of SMA
    Azadi, B.
    Rajapakse, R. K. N. D.
    Maijer, D. M.
    SMART STRUCTURES AND MATERIALS 2006: MODELING, SIGNAL PROCESSING, AND CONTROL, 2006, 6166 : U362 - U370
  • [4] MULTI-DIMENSIONAL SIGNALING
    WILSON, R
    ECONOMICS LETTERS, 1985, 19 (01) : 17 - 21
  • [5] MULTI-DIMENSIONAL CONSEQUENTIALISM
    Peterson, Martin
    RATIO, 2012, 25 (02) : 177 - 194
  • [6] A multi-dimensional world
    Taiwan Rev., 2007, 9 (48-49):
  • [7] Multi-dimensional rules
    Courtin, Sebastien
    Laruelle, Annick
    MATHEMATICAL SOCIAL SCIENCES, 2020, 103 : 1 - 7
  • [8] Multi-dimensional lives
    Mark Ronan
    Nature, 2008, 451 (7179) : 629 - 629
  • [9] A MULTI-DIMENSIONAL BOOK
    NEWSON, L
    NEW SCIENTIST, 1988, 119 (1629) : 82 - 82
  • [10] ON MULTI-DIMENSIONAL TIME
    BUNGE, M
    BRITISH JOURNAL FOR THE PHILOSOPHY OF SCIENCE, 1958, 9 (33): : 39 - 39