Differentially Private Frequent Sequence Mining

被引:1
|
作者
Xu, Shengzhi [1 ]
Cheng, Xiang [1 ]
Su, Sen [1 ]
Xiao, Ke [1 ]
Xiong, Li [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
[2] Emory Univ, Atlanta, GA 30322 USA
基金
中国国家自然科学基金;
关键词
Frequent sequence mining; differential privacy; candidate pruning;
D O I
10.1109/TKDE.2016.2601106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively prune those unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose PFS2, a novel differentially private FSM algorithm. It is the first algorithm that supports the general gap-constrained FSM in the context of differential privacy. The gap constraints in FSM can be used to limit the mining results to a controlled set of frequent sequences. In our PFS2 algorithm, the core is to utilize sample databases to prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which candidate sequences are potentially frequent. To improve the accuracy of such private estimations, a gap-aware sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to calibrate the amount of noise required by differential privacy, a gap-aware sensitivity computation method is proposed to obtain the sensitivity of the local support computations with different gap constraints. Furthermore, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is epsilon-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy.
引用
收藏
页码:2910 / 2926
页数:17
相关论文
共 50 条
  • [11] PrivTS: Differentially Private Frequent Time-Constrained Sequential Pattern Mining
    Li, Yanhui
    Wang, Guoren
    Yuan, Ye
    Cao, Xin
    Yuan, Long
    Lin, Xuemin
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2018), PT II, 2018, 10828 : 92 - 111
  • [12] Differentially Private Frequent Itemset Mining from Smart Devices in Local Setting
    Zhang, Xinyuan
    Huang, Liusheng
    Fang, Peng
    Wang, Shaowei
    Zhu, Zhenyu
    Xu, Hongli
    [J]. WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2017, 2017, 10251 : 433 - 444
  • [13] A Graph-Based Differentially Private Algorithm for Mining Frequent Sequential Patterns
    Nunez-del-Prado, Miguel
    Maehara-Aliaga, Yoshitomi
    Salas, Julian
    Alatrista-Salas, Hugo
    Megias, David
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [14] PrivTS: Differentially private frequent time-constrained sequential pattern mining
    Li, Yanhui
    Wang, Guoren
    Yuan, Ye
    Cao, Xin
    Yuan, Long
    Lin, Xuemin
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 10828 LNCS : 92 - 111
  • [15] Differentially Private Two-Party Top-k Frequent Item Mining
    Tong, Wei
    Chen, Wenjie
    Han, Tingxuan
    Chen, Haoyu
    Zhong, Sheng
    [J]. 2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 166 - 177
  • [16] DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting
    Cheng, Xiang
    Su, Sen
    Xu, Shengzhi
    Li, Zhengyi
    [J]. COMPUTERS & SECURITY, 2015, 50 : 74 - 90
  • [17] A Differentially Private Scheme for Top-k Frequent Itemsets Mining Over Data Streams
    Liang, Wen-Juan
    Chen, Hong
    Zhao, Su-Yun
    Li, Cui-Ping
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (04): : 741 - 760
  • [18] Locally Differentially Private Frequent Pattern Mining for High-Dimensional Data in Mobile Smart Services
    Li, Qi
    Peng, Shunshun
    Wu, Haonan
    Ran, Ruisheng
    Li, Yong
    Zhou, Mingliang
    Guo, Taolin
    Mao, Qin
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (15)
  • [19] Frequent Patterns Mining in DNA Sequence
    Deng, Na
    Chen, Xu
    Li, Desheng
    Xiong, Caiquan
    [J]. IEEE ACCESS, 2019, 7 : 108400 - 108410
  • [20] Preserving private knowledge in frequent pattern mining
    Wang, Zhihui
    Wang, Wei
    Shi, Baile
    Boey, S. H.
    [J]. ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 530 - 534