Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

被引:0
|
作者
Xu, Shengzhi [1 ]
Su, Sen [1 ]
Cheng, Xiang [1 ]
Li, Zhengyi [1 ]
Xiong, Li [2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing, Peoples R China
[2] Emory Univ, Math & Comp Sci Dept, Atlanta, GA 30322 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is epsilon-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy.
引用
收藏
页码:1035 / 1046
页数:12
相关论文
共 50 条
  • [1] Differentially Private Frequent Sequence Mining
    Xu, Shengzhi
    Cheng, Xiang
    Su, Sen
    Xiao, Ke
    Xiong, Li
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) : 2910 - 2926
  • [2] Differentially private maximal frequent sequence mining
    Cheng, Xiang
    Su, Sen
    Xu, Shengzhi
    Tang, Peng
    Li, Zhengyi
    [J]. COMPUTERS & SECURITY, 2015, 55 : 175 - 192
  • [3] Differentially Private High-Dimensional Data Publication via Sampling-Based Inference
    Chen, Rui
    Xiao, Qian
    Zhang, Yu
    Xu, Jianliang
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 129 - 138
  • [4] Differentially Private Frequent Itemset Mining via Transaction Splitting
    Su, Sen
    Xu, Shengzhi
    Cheng, Xiang
    Li, Zhengyi
    Yang, Fangchun
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1564 - 1565
  • [5] A sampling-based method for mining frequent patterns from databases
    Chen, YL
    Ho, CY
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 536 - 545
  • [6] Differentially Private Frequent Itemset Mining via Transaction Splitting
    Su, Sen
    Xu, Shengzhi
    Cheng, Xiang
    Li, Zhengyi
    Yang, Fangchun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (07) : 1875 - 1891
  • [7] Differentially Private Frequent Subgraph Mining
    Xu, Shengzhi
    Su, Sen
    Xiong, Li
    Cheng, Xiang
    Xiao, Ke
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 229 - 240
  • [8] On Differentially Private Frequent Itemset Mining
    Zeng, Chen
    Naughton, Jeffrey F.
    Cai, Jin-Yi
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 6 (01): : 25 - 36
  • [9] Locally Differentially Private Frequent Itemset Mining
    Wang, Tianhao
    Li, Ninghui
    Jha, Somesh
    [J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2018, : 127 - 143
  • [10] Full duplicate candidate pruning for frequent connected subgraph mining
    Gago-Alonso, Andres
    Carrasco-Ochoa, Jesus A.
    Medina-Pagola, Jose E.
    Fco. Martinez-Trinidad, Jose
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2010, 17 (03) : 211 - 225