Differentially Private Clustering in High-Dimensional Euclidean Spaces

被引:0
|
作者
Balcan, Maria-Florina [1 ]
Dick, Travis [1 ]
Liang, Yingyu [2 ]
Mou, Wenlong [3 ]
Zhang, Hongyang [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Princeton Univ, Princeton, NJ 08544 USA
[3] Peking Univ, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks. Although the problem has been widely studied in the context of low-dimensional, discrete spaces, much remains unknown concerning private clustering in high-dimensional Euclidean spaces R-d. In this work, we give differentially private and efficient algorithms achieving strong guarantees for k-means and k-median clustering when d = Omega( polylog(n)). Our algorithm achieves clustering loss at most log(3) (n) OPT + poly(log n, d, k), advancing the state-of-the-art result of root dOPT+ poly(log n, d(d), k(d)). We also study the case where the data points are s-sparse and show that the clustering loss can scale logarithmically with d, i.e., log(3) (n)OPT poly(log n, log d, k, s). Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [31] The mathematics of high-dimensional spaces
    Rogers, D
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1998, 215 : U524 - U524
  • [32] REPORTING NEIGHBORS IN HIGH-DIMENSIONAL EUCLIDEAN SPACE
    Aiger, Dror
    Kaplan, Haim
    Sharir, Micha
    SIAM JOURNAL ON COMPUTING, 2014, 43 (04) : 1363 - 1395
  • [33] Differentially Private High-Dimensional Data Publication via Sampling-Based Inference
    Chen, Rui
    Xiao, Qian
    Zhang, Yu
    Xu, Jianliang
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 129 - 138
  • [34] Differentially Private Top-k Frequent Columns Publication for High-Dimensional Data
    Wang, Ning
    Wang, Zhigang
    Gu, Yu
    Xu, Jia
    Wei, Zhiqiang
    Yu, Ge
    IEEE ACCESS, 2019, 7 : 177342 - 177353
  • [35] The Unbalancing Effect of Hubs on K-medoids Clustering in High-Dimensional Spaces
    Schnitzer, Dominik
    Flexer, Arthur
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [36] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11
  • [37] Subquadratic High-Dimensional Hierarchical Clustering
    Abboud, Amir
    Cohen-Addad, Vincent
    Houdrouge, Hussein
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [38] On the optimality of kernels for high-dimensional clustering
    Vankadara, Leena Chennuru
    Ghoshdastidar, Debarghya
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [39] A Survey on High-Dimensional Subspace Clustering
    Qu, Wentao
    Xiu, Xianchao
    Chen, Huangyue
    Kong, Lingchen
    MATHEMATICS, 2023, 11 (02)
  • [40] Compressive Clustering of High-dimensional Data
    Ruta, Andrzej
    Porikli, Fatih
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 380 - 385