Differentially Private K-Means Publishing with Distributed Dimensions

被引:0
|
作者
Zhu, Boyu [1 ]
Zhang, Yuan [1 ]
Chen, Tingting [2 ]
Zhong, Sheng [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Comp Sci & Technol Dept, Nanjing, Peoples R China
[2] Calif State Polytech Univ Pomona, Dept Comp Sci, Coll Sci, Pomona, CA USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
D O I
10.1109/CSCWD61410.2024.10580021
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we address the critical concerns related to dataset privacy in the context of k-means clustering publishing within a distributed dimension setting. By leveraging differential privacy mechanisms, we propose a novel framework that integrates a differentially private classifier, constructed through voting based on raw clustering results, and an enhanced generative adversarial network (GAN) simulating the classifier's behavior in inferring class labels for a public dataset. Our approach generates synthetic clustering results that mimic real outcomes in classification tasks, ensuring differential privacy and minimizing noise. Our contributions include a comprehensive exploration of privacy issues, the introduction of a novel privacy-preserving k-means clustering framework, and theoretical analyses demonstrating sensitivity and differential privacy guarantees. Evaluation on the MNIST dataset demonstrates the effectiveness of the framework, achieving 82.22% accuracy with a (10.48, 10-9)-differential-privacy guarantee, compared to 83.45% accuracy without privacy-preserving.
引用
收藏
页码:3263 / 3268
页数:6
相关论文
共 50 条
  • [21] Automatic Determination of K in Distributed K-Means Clustering
    Kotary, Dinesh Kumar
    Nanda, Satyasai Jagannath
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 556 - 564
  • [22] Distributed Clustering Based on K-means and CPGA
    Zhou, Jun
    Liu, Zhijing
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
  • [23] Evolutionary k-means for distributed data sets
    Naldi, M. C.
    Campello, R. J. G. B.
    NEUROCOMPUTING, 2014, 127 : 30 - 42
  • [24] Conceptualized phrase clustering with distributed k-means
    Anoop, V. S.
    Asharaf, S.
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (02): : 153 - 160
  • [25] Distributed k-Means with Outliers in General Metrics
    Dandolo, Enrico
    Pietracaprina, Andrea
    Pucci, Geppino
    EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 474 - 488
  • [26] MARIGOLD: Efficient k-means Clustering in High Dimensions
    Mortensen, Kasper Overgaard
    Zardbani, Fatemeh
    Haque, Mohammad Ahsanul
    Agustsson, Steinn Ymir
    Mottin, Davide
    Hofmann, Philip
    Karras, Panagiotis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (07): : 1740 - 1748
  • [27] Improving k-means through distributed scalable metaheuristics
    Oliveira, G. V.
    Coutinho, F. P.
    Campello, R. J. G. B.
    Naldi, M. C.
    NEUROCOMPUTING, 2017, 246 : 45 - 57
  • [28] Fast Distributed k-Means with a Small Number of Rounds
    Hess, Tom
    Visbord, Ron
    Sabato, Sivan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206 : 850 - 874
  • [29] Distributed k-means Clustering with Low Transmission Cost
    Naldi, Murilo Coelho
    Gabrielli Barreto Campello, Ricardo Jose
    2013 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2013, : 70 - 75
  • [30] Comparison of distributed evolutionary k-means clustering algorithms
    Naldi, M. C.
    Campello, R. J. G. B.
    NEUROCOMPUTING, 2015, 163 : 78 - 93