Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics

被引:0
|
作者
Xiang, Gang [1 ]
Kreinovich, Vladik [2 ]
机构
[1] Appl Biomath, 100 North Country Rd, Setauket, NY 11733 USA
[2] Univ Texas El Paso, Dept Comp Sci, El Paso, TX 79968 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To preserve privacy, we divide the data space into boxes, and instead of original data points, only store the corresponding boxes. In accordance with the current practice, the desired level of privacy is established by having at least k different records in each box, for a given value k (the larger the value k, the higher the privacy level). When we process the data, then the use of boxes instead of the original exact values leads to uncertainty. In this paper, we find the (asymptotically) optimal subdivision of data into boxes, a subdivision that provides, for a given statistical characteristic like variance, covariance, or correlation, the smallest uncertainty within the given level of privacy. In areas where the empirical data density is small, boxes containing k points are large in size, which results in large uncertainty. To avoid this, we propose, when computing the corresponding characteristic, to only use data from boxes with a sufficiently large density. This deletion of data points increases the statistical uncertainty, but decreases the uncertainty caused by introducing the privacy-related boxes. We explain how to compute an (asymptotically) optimal threshold for which the overall uncertainty is (asymptotically) the smallest.
引用
收藏
页码:163 / 170
页数:8
相关论文
共 50 条
  • [1] Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics: Fuzzy-Motivated Approach
    Xiang, G.
    Ferson, S.
    Ginzburg, L.
    Longpre, L.
    Mayorga, E.
    Kosheleva, O.
    [J]. PROCEEDINGS OF THE 2013 JOINT IFSA WORLD CONGRESS AND NAFIPS ANNUAL MEETING (IFSA/NAFIPS), 2013, : 611 - 616
  • [2] Anonymization of Statistical Data
    di Vimercati, Sabrina De Capitani
    Foresti, Sara
    Livraga, Giovanni
    Samarati, Pierangela
    [J]. IT-INFORMATION TECHNOLOGY, 2011, 53 (01): : 18 - 25
  • [3] Implications of Data Anonymization on the Statistical Evidence of Disparity
    Xu, Heng
    Zhang, Nan
    [J]. MANAGEMENT SCIENCE, 2022, 68 (04) : 2600 - 2618
  • [4] RAPID AND ACCURATE ESTIMATES OF STATISTICAL SIGNIFICANCE FOR SEQUENCE DATA-BASE SEARCHES
    WATERMAN, MS
    VINGRON, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (11) : 4625 - 4628
  • [5] Which heritability estimates of smoking progression measures are the most accurate?
    Maes, Hermine
    Neale, Michael
    Prom-Wormley, Elizabeth
    Kendler, Kenneth
    [J]. BEHAVIOR GENETICS, 2011, 41 (06) : 922 - 922
  • [6] CHARACTERISTICS OF THE MOST RECENT DEVELOPMENTS IN STATISTICAL METHODOLOGY
    Gini, Corrado
    [J]. STATISTICA, 2010, 70 (04): : 395 - 402
  • [7] Anonymization of distribution feeder data using statistical distribution and parameter estimation approach
    Ali, Muhammad
    Prakash, Krishneel
    Macana, Carlos
    Rabiul, Md
    Hussain, Akhtar
    Pota, Hemanshu
    [J]. SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2022, 52
  • [8] Making the most of the information in accurate mass spectrometric data
    Ferguson, James
    Reibach, Paul
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [9] Accurate Health Estimates from HUMS Vibration Data
    Teixeira, Rodrigo E.
    Morris, Kari E.
    Sautter, F. Christian
    [J]. 2015 IEEE CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (PHM), 2015,
  • [10] STATISTICAL CHARACTERISTICS OF TITRIMETRIC DATA
    CHERNOVA, NA
    MELNIKOV, PP
    SHATSKII, VM
    [J]. ZHURNAL FIZICHESKOI KHIMII, 1973, 47 (06): : 1613 - 1613