Towards Private Key-Value Data Collection with Histogram

被引:0
|
作者
Zhang X. [1 ]
Xu Y. [1 ]
Fu N. [1 ]
Meng X. [2 ]
机构
[1] School of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou
[2] School of Information, Renmin University of China, Beijing
基金
中国国家自然科学基金;
关键词
Frequency estimation; Key-value data; Local differential privacy; Mean estimation; Randomized response mechanism;
D O I
10.7544/issn1000-1239.2021.20200319
中图分类号
学科分类号
摘要
Recently, user data collection and analysis with local differential privacy has extended into key-value data. The trade-off between the size and sparsity of domain and perturbation method directly constrains the accuracy of the collection and analysis of such data. To remedy the deficiency caused by the domain size and perturbating method, this paper employs histogram technology to propose an efficient solution, called HISKV, to collect key-value data. HISKV firstly uses a user-grouping strategy and partial privacy budget to find the optimal length of truncation and enables each user to truncate his/her key-value data set. And then, based on the truncated set, each user samples one key-value pair and uses the discretization and perturbation method to process this pair. To perturb key-value data efficiently, a novel mechanism in HISKV, named LRR_KV is proposed, which allocates different perturbing probability for different keys. In LRR_KV, each user adopts this mechanism to add noise to his/her sampled pair, and sents the report to a collector. Based on the reports from all of the users, the collector estimates the frequency of each key and the mean of the values. To evaluate the utility of HISKV, we firstly conduct theoretical analysis on unbias, variance, and error bound of LRR_KV, and then perform experiments on real and synthetic datasets to compare different methods. The experimental results show that HISKV outperforms its competitors. © 2021, Science Press. All right reserved.
引用
收藏
页码:624 / 637
页数:13
相关论文
共 15 条
  • [1] Ye Qingqing, Hu Haibo, Meng Xiaofeng, Et al., PrivKV: Key-value data collection with local differential privacy, Proc of the 28th USENIX Security Symp (S&P 2019), pp. 317-331, (2019)
  • [2] Sun Lin, Zhao Jun, Ye Xiaojun, Et al., Conditional analysis for key-value data with local differential privacy, (2019)
  • [3] Gu Xiaolan, Li Ming, Cheng Yueqiang, Et al., PCKV: Locally differentially private correlated key-value data collection with optimized utility, (2019)
  • [4] Zhang Xiaojian, Fu Nan, Meng Xiaofeng, Key-value data collection under local differential privacy, Chinese Journal of Computers, 43, 8, pp. 1479-1492, (2020)
  • [5] Wang Tianhao, Li Ninghui, Jha S., Locally differentially private frequent itemset mining, Proc of IEEE Symp on Security and Privacy (SP2018), pp. 127-143, (2018)
  • [6] Warner S L., Randomized response: A survey technique for eliminating evasive answer bias, Journal of the American Statal Association, 60, 309, pp. 63-69, (1965)
  • [7] Wang Tianhao, Bloci J., Locally differentially private protocols for frequency estimation, Proc of the 26th USENIX Security Symp (SP 2017), pp. 729-745, (2017)
  • [8] Erlingsson U, Pihur V, Korolova A., RAPPOR: Randomized aggregatable privacy-preserving ordinal response, Proc of the 2014 ACM SIGSAC Conf on Computer and Communications Security (CCS 2014), pp. 1054-1067, (2014)
  • [9] Kairouz P, Bonawitz K, Ramage D., Discrete distribution estimation under local privacy, Proc of the 33rd Int Conf on Machine Learning (ICML 2016), pp. 2436-2444, (2016)
  • [10] Bassily R, Smith A., Local, private, efficient protocols for succinct histograms, Proc of the 47th Annual ACM on Symp on Theory of Computing (STOC 2015), pp. 127-135, (2015)