Privacy-preserving data publishing for cluster analysis

被引:40
|
作者
Fung, Benjamin C. M. [1 ]
Wang, Ke [2 ]
Wang, Lingyu [1 ]
Hung, Patrick C. K. [3 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 1M8, Canada
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Univ Ontario, Inst Technol, Fac Business & Informat Technol, Oshawa, ON L1H 7K4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Privacy; Knowledge discovery; Anonymity; Cluster analysis; K-ANONYMITY; MODEL;
D O I
10.1016/j.datak.2008.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing person-specific data could potentially reveal sensitive information about individuals. k-anonymization is a promising privacy protection mechanism in data publishing. Although substantial research has been conducted on k-anonymization and its extensions in recent years, only a few prior works have considered releasing data for some specific purpose of data analysis. This paper presents a practical data publishing framework for generating a masked version of data that preserves both individual privacy and information usefulness for cluster analysis. Experiments on real-life data suggest that by focusing on preserving cluster structure in the masking process, the cluster quality is significantly better than the cluster quality of the masked data without such focus. The major challenge of masking data for Cluster analysis is the lack of class labels that could be used to guide the masking process. Our approach converts the problem into the counterpart problem for classification analysis, wherein class labels encode the cluster structure in the data, and presents a framework to evaluate the cluster quality on the masked data. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:552 / 575
页数:24
相关论文
共 50 条
  • [1] Privacy-Preserving Data Publishing
    Liu, Ruilin
    Wang, Hui
    [J]. 2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 305 - 308
  • [2] Privacy-Preserving Medical Reports Publishing for Cluster Analysis
    Hmood, Ali K.
    Fung, Benjamin C. M.
    Iqbal, Farkhund
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON NEW TECHNOLOGIES, MOBILITY AND SECURITY (NTMS), 2014,
  • [3] Privacy-Preserving Data Publishing
    Chen, Bee-Chung
    Kifer, Daniel
    LeFevre, Kristen
    Machanavajjhala, Ashwin
    [J]. FOUNDATIONS AND TRENDS IN DATABASES, 2009, 2 (1-2): : 1 - 167
  • [4] Privacy-Preserving Sequential Data Publishing
    Wang, Huili
    Ma, Wenping
    Zheng, Haibin
    Liang, Zhi
    Wu, Qianhong
    [J]. NETWORK AND SYSTEM SECURITY, NSS 2019, 2019, 11928 : 596 - 614
  • [5] Privacy-Preserving Big Data Publishing
    Zakerzadeh, Hessam
    Aggarwal, Charu C.
    Barker, Ken
    [J]. PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2015,
  • [6] Privacy-Preserving Publishing of Hierarchical Data
    Ozalp, Ismet
    Gursoy, Mehmet Emre
    Nergiz, Mehmet Ercan
    Saygin, Yucel
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2016, 19 (03)
  • [7] Inference Analysis in Privacy-Preserving Data Re-publishing
    Wang, Guan
    Zhu, Zutao
    Du, Wenliang
    Teng, Zhouxuan
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 1079 - 1084
  • [8] Personalized Privacy-Preserving Trajectory Data Publishing
    Lu Qiwei
    Wang Caimei
    Xiong Yan
    Xia Huihua
    Huang Wenchao
    Gong Xudong
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (02) : 285 - 291
  • [9] An efficient privacy-preserving approach for data publishing
    Xinyu Qian
    Xinning Li
    Zhiping Zhou
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 2077 - 2093
  • [10] Privacy-Preserving Continuous Event Data Publishing
    Rafiei, Majid
    van der Aalst, Wil M. P.
    [J]. BUSINESS PROCESS MANAGEMENT FORUM (BPM 2021), 2021, 427 : 178 - 194