Differentially Private Frequent Sequence Mining

被引：1

作者：

Xu, Shengzhi ^{[1
]}

Cheng, Xiang ^{[1
]}

Su, Sen ^{[1
]}

Xiao, Ke ^{[1
]}

Xiong, Li ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China

[2] Emory Univ, Atlanta, GA 30322 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2016年 / 28卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Frequent sequence mining; differential privacy; candidate pruning;

D O I：

10.1109/TKDE.2016.2601106

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively prune those unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose PFS2, a novel differentially private FSM algorithm. It is the first algorithm that supports the general gap-constrained FSM in the context of differential privacy. The gap constraints in FSM can be used to limit the mining results to a controlled set of frequent sequences. In our PFS2 algorithm, the core is to utilize sample databases to prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which candidate sequences are potentially frequent. To improve the accuracy of such private estimations, a gap-aware sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to calibrate the amount of noise required by differential privacy, a gap-aware sensitivity computation method is proposed to obtain the sensitivity of the local support computations with different gap constraints. Furthermore, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is epsilon-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy.

引用

页码：2910 / 2926

页数：17

共 50 条

[1] Differentially private maximal frequent sequence mining
Cheng, Xiang
Su, Sen
Xu, Shengzhi
Tang, Peng
Li, Zhengyi
[J]. COMPUTERS & SECURITY, 2015, 55 : 175 - 192
[2] Differentially Private Frequent Subgraph Mining
Xu, Shengzhi
Su, Sen
Xiong, Li
Cheng, Xiang
Xiao, Ke
[J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 229 - 240
[3] On Differentially Private Frequent Itemset Mining
Zeng, Chen
Naughton, Jeffrey F.
Cai, Jin-Yi
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 6 (01): : 25 - 36
[4] Locally Differentially Private Frequent Itemset Mining
Wang, Tianhao
Li, Ninghui
Jha, Somesh
[J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2018, : 127 - 143
[5] Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning
Xu, Shengzhi
Su, Sen
Cheng, Xiang
Li, Zhengyi
Xiong, Li
[J]. 2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 1035 - 1046
[6] Differentially private frequent episode mining over event streams
Qin, Jiawen
Wang, Jinyan
Li, Qiyu
Fang, Shijian
Li, Xianxian
Lei, Lei
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 110
[7] Differentially Private Frequent Itemset Mining via Transaction Splitting
Su, Sen
Xu, Shengzhi
Cheng, Xiang
Li, Zhengyi
Yang, Fangchun
[J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1564 - 1565
[8] Differentially Private Frequent Itemset Mining Against Incremental Updates
Liang, Wenjuan
Chen, Hong
Wu, Yuncheng
Li, Cuiping
[J]. INFORMATION AND COMMUNICATIONS SECURITY (ICICS 2019), 2020, 11999 : 649 - 667
[9] Differentially Private Frequent Itemset Mining via Transaction Splitting
Su, Sen
Xu, Shengzhi
Cheng, Xiang
Li, Zhengyi
Yang, Fangchun
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (07) : 1875 - 1891
[10] A Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining
Cheng, Xiang
Su, Sen
Xu, Shengzhi
Xiong, Li
Xiao, Ke
Zhao, Mingxing
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (08) : 1411 - 1425

← 1 2 3 4 5 →