Privacy-preserving data publishing for cluster analysis

被引:40
|
作者
Fung, Benjamin C. M. [1 ]
Wang, Ke [2 ]
Wang, Lingyu [1 ]
Hung, Patrick C. K. [3 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 1M8, Canada
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Univ Ontario, Inst Technol, Fac Business & Informat Technol, Oshawa, ON L1H 7K4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Privacy; Knowledge discovery; Anonymity; Cluster analysis; K-ANONYMITY; MODEL;
D O I
10.1016/j.datak.2008.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing person-specific data could potentially reveal sensitive information about individuals. k-anonymization is a promising privacy protection mechanism in data publishing. Although substantial research has been conducted on k-anonymization and its extensions in recent years, only a few prior works have considered releasing data for some specific purpose of data analysis. This paper presents a practical data publishing framework for generating a masked version of data that preserves both individual privacy and information usefulness for cluster analysis. Experiments on real-life data suggest that by focusing on preserving cluster structure in the masking process, the cluster quality is significantly better than the cluster quality of the masked data without such focus. The major challenge of masking data for Cluster analysis is the lack of class labels that could be used to guide the masking process. Our approach converts the problem into the counterpart problem for classification analysis, wherein class labels encode the cluster structure in the data, and presents a framework to evaluate the cluster quality on the masked data. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:552 / 575
页数:24
相关论文
共 50 条
  • [31] Score, Arrange, and Cluster: A Novel Clustering-Based Technique for Privacy-Preserving Data Publishing
    Sowmyarani, C. N.
    Namya, L. G.
    Nidhi, G. K.
    Ramakanth Kumar, P.
    [J]. IEEE ACCESS, 2024, 12 : 79861 - 79874
  • [32] Quantifying the costs and benefits of privacy-preserving health data publishing
    Khokhar, Rashid Hussain
    Chen, Rui
    Fung, Benjamin C. M.
    Lui, Siu Man
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 50 : 107 - 121
  • [33] Privacy-Preserving Data Publishing for Multiple Numerical Sensitive Attributes
    Qinghai Liu
    Hong Shen
    Yingpeng Sang
    [J]. Tsinghua Science and Technology, 2015, 20 (03) : 246 - 254
  • [34] Pseudonym Exchange for Privacy-Preserving Publishing of Trajectory Data Set
    Mano, Ken
    Minami, Kazuhiro
    Maruyama, Hiroshi
    [J]. 2014 IEEE 3RD GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2014, : 691 - 695
  • [35] Two privacy-preserving approaches for data publishing with identity reservation
    Wang, Jinyan
    Du, Kai
    Luo, Xudong
    Li, Xianxian
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (02) : 1039 - 1080
  • [36] Privacy-Preserving Spatio-Temporal Patient Data Publishing
    Olawoyin, Anifat M.
    Leung, Carson K.
    Choudhury, Ratna
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT II, 2020, 12392 : 407 - 416
  • [37] Background knowledge attacks in privacy-preserving data publishing models
    Desai, Nidhi
    Das, Manik Lal
    Chaudhari, Payal
    Kumar, Naveen
    [J]. COMPUTERS & SECURITY, 2022, 122
  • [38] Toward Scalable Anonymization for Privacy-Preserving Big Data Publishing
    Mehta, Brijesh B.
    Rao, Udai Pratap
    [J]. RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 2, 2018, 708 : 297 - 304
  • [39] A Survey and Experimental Study on Privacy-Preserving Trajectory Data Publishing
    Jin, Fengmei
    Hua, Wen
    Francia, Matteo
    Chao, Pingfu
    Orlowska, Maria E.
    Zhou, Xiaofang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5577 - 5596
  • [40] Two privacy-preserving approaches for data publishing with identity reservation
    Jinyan Wang
    Kai Du
    Xudong Luo
    Xianxian Li
    [J]. Knowledge and Information Systems, 2019, 60 : 1039 - 1080