Differentially Private K-Means Clustering Applied to Meter Data Analysis and Synthesis

被引:11
|
作者
Ravi, Nikhil [1 ]
Scaglione, Anna [1 ]
Kadam, Sachin [2 ,3 ]
Gentz, Reinhard [4 ,5 ]
Peisert, Sean [4 ]
Lunghino, Brent [6 ]
Levijarvi, Emmanuel [7 ]
Shumavon, Aram [8 ]
机构
[1] Cornell Tech, Dept Elect & Comp Engn, New York, NY 10044 USA
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
[3] Sungkyunkwan Univ, Suwon 16419, Gyeonggi, South Korea
[4] Lawrence Berkeley Natl Lab, Computat Res, Berkeley, CA 94720 USA
[5] Amazon, Networking Dept, Seattle, WA 98170 USA
[6] Kevala Inc, Data Sci & Methodol Implementat, San Francisco, CA 94133 USA
[7] Kevala Inc, Software Engn Dept, San Francisco, CA 94133 USA
[8] Kevala Inc, San Francisco, CA 94133 USA
关键词
Differential privacy; clustering; smart grids; summary statistics; synthetic load generation; NOISE;
D O I
10.1109/TSG.2022.3184252
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The proliferation of smart meters has resulted in a large amount of data being generated. It is increasingly apparent that methods are required for allowing a variety of stakeholders to leverage the data in a manner that preserves the privacy of the consumers. The sector is scrambling to define policies, such as the so called '15/15 rule', to respond to the need. However, the current policies fail to adequately guarantee privacy. In this paper, we address the problem of allowing third parties to apply K-means clustering, obtaining customer labels and centroids for a set of load time series by applying the framework of differential privacy. We leverage the method to design an algorithm that generates differentially private synthetic load data consistent with the labeled data. We test our algorithm's utility by answering summary statistics such as average daily load profiles for a 2-dimensional synthetic dataset and a real-world power load dataset.
引用
收藏
页码:4801 / 4814
页数:14
相关论文
共 50 条
  • [21] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [22] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [23] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [24] K-means clustering for SAT-AIS data analysis
    Mieczynska, Marta
    Czarnowski, Ireneusz
    WMU JOURNAL OF MARITIME AFFAIRS, 2021, 20 (03) : 377 - 400
  • [25] Analysis and Visualization of Twitter Data using k-means Clustering
    Garg, Neha
    Rani, Rinkle
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 670 - 675
  • [26] K-Means Clustering and the Analysis of Malaria Microarray Data (MMD)
    Osamor, Victor Chukwudi
    Adebiyi, E.
    Doumbia, S.
    INFECTION GENETICS AND EVOLUTION, 2009, 9 (03) : 378 - 378
  • [27] K-means clustering for SAT-AIS data analysis
    Marta Mieczyńska
    Ireneusz Czarnowski
    WMU Journal of Maritime Affairs, 2021, 20 : 377 - 400
  • [28] Stability analysis in K-means clustering
    Steinley, Douglas
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2008, 61 : 255 - 273
  • [29] Bagged K-means clustering of metabolome data
    Hageman, J. A.
    van den Berg, R. A.
    Westerhuis, J. A.
    Hoefsloot, H. C. J.
    Smilde, A. K.
    CRITICAL REVIEWS IN ANALYTICAL CHEMISTRY, 2006, 36 (3-4) : 211 - 220
  • [30] K-means Data Clustering with Memristor Networks
    Jeong, YeonJoo
    Lee, Jihang
    Moon, John
    Shin, Jong Hoon
    Lu, Wei D.
    NANO LETTERS, 2018, 18 (07) : 4447 - 4453