Privacy-preserving mechanisms for k-modes clustering

被引:31
|
作者
Huu Hiep Nguyen [1 ]
机构
[1] Duy Tan Univ, Inst Res & Dev, P809 7-25 Quang Trung, Danang 550000, Vietnam
关键词
Differential privacy; k-modes clustering; DP-Modes-Lloyd; DP-Modes-MCMC; DP-modes-synopsis; Friedman test; ALGORITHM;
D O I
10.1016/j.cose.2018.06.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering categorical data is an important data mining task with rich applications. As an extension of k-means applied to categorical data, the k-modes algorithm became a popular clustering tool due to its simplicity and efficiency. A lot of improvements for k-modes such as better initialization techniques or more effective dissimilarity scores have been proposed recently. However, the problem of running the k-modes in private manners is rarely considered. In this paper, we address the privacy-preserving k-modes problem using differential privacy, a formal and rigorous definition of privacy for data publication. Differential privacy guarantees that the existence of any item in the input dataset is indistinguishable by looking at the computation output. We analyze the challenges of differentially private k-modes with regard to the k-means counterpart. Then we propose several schemes in both interactive and non-interactive settings. We prove that our mechanisms satisfy differential privacy and run linearly in the number of data points. Evaluation over fifteen real datasets shows that we can achieve useful privacy-preserving clustering outputs. In terms of clustering cost, the interactive approaches perform better than the non-interactive schemes and the solution adapted from k-means. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:60 / 75
页数:16
相关论文
共 50 条
  • [1] K-modes clustering
    Chaturvedi, A
    Green, PE
    Carroll, JD
    [J]. JOURNAL OF CLASSIFICATION, 2001, 18 (01) : 35 - 55
  • [2] K-modes Clustering
    Anil Chaturvedi
    Paul E. Green
    J. Douglas Caroll
    [J]. Journal of Classification, 2001, 18 : 35 - 55
  • [3] A note on K-modes clustering
    Huang, ZX
    Ng, MK
    [J]. JOURNAL OF CLASSIFICATION, 2003, 20 (02) : 257 - 261
  • [4] A Note on K-modes Clustering
    Zhexue Huang
    Michael K. Ng
    [J]. Journal of Classification, 2003, 20 : 257 - 261
  • [5] Privacy-preserving distributed clustering
    Erkin, Zekeriya
    Veugen, Thijs
    Toft, Tomas
    Lagendijk, Reginald L.
    [J]. EURASIP JOURNAL ON INFORMATION SECURITY, 2013, (01):
  • [6] DP- k-modes: A self-tuning k-modes clustering algorithm
    Xie, Juanying
    Wang, Mingzhao
    Lu, Xiaoxiao
    Liu, Xinglin
    Grant, Philip W.
    [J]. PATTERN RECOGNITION LETTERS, 2022, 158 : 117 - 124
  • [7] A New Privacy-Preserving Distributed k-Clustering Algorithm
    Jagannathan, Geetha
    Pillaipakkamnatt, Krishnan
    Wright, Rebecca N.
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 494 - +
  • [8] A High-Availability K-modes Clustering Method Based on Differential Privacy
    Zhang, Shaobo
    Yuan, Liujie
    Li, Yuxing
    Chen, Wenli
    Ding, Yifei
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 274 - 283
  • [9] Approximation algorithms for K-modes clustering
    He, Zengyou
    Deng, Shengchun
    Xu, Xiaofei
    [J]. COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 296 - 302
  • [10] Privacy-preserving collaborative fuzzy clustering
    Lyu, Lingjuan
    Bezdek, James C.
    Law, Yee Wei
    He, Xuanli
    Palaniswami, Marimuthu
    [J]. DATA & KNOWLEDGE ENGINEERING, 2018, 116 : 21 - 41