Privacy protection in data mining: A perturbation approach for categorical data

被引:21
|
作者
Li, Xiao-Bai [1 ]
Sarkar, Sumit
机构
[1] Univ Massachusetts, Coll Management, Lowell, MA 01854 USA
[2] Univ Texas, Sch Management, Richardson, TX 75080 USA
关键词
privacy; data confidentiality; data mining; linear programming; Bayesian estimation; data swapping;
D O I
10.1287/isre.1060.0095
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase 11 (to preserve the joint distribution).
引用
收藏
页码:254 / 270
页数:17
相关论文
共 50 条
  • [21] A Combined Random Noise Perturbation Approach for Multi Level Privacy Preservation in Data Mining
    Chidambaram, S.
    Srinivasagan, K. G.
    2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [22] Random-data perturbation techniques and privacy-preserving data mining
    Hillol Kargupta
    Souptik Datta
    Qi Wang
    Krishnamoorthy Sivakumar
    Knowledge and Information Systems, 2005, 7 : 387 - 414
  • [23] A privacy protection technique for publishing data mining models and research data
    Fu Y.
    Chen Z.
    Koru G.
    Gangopadhyay A.
    ACM Transactions on Management Information Systems, 2010, 1 (01)
  • [24] Privacy-Leveled Perturbation Model for Privacy Preserving Collaborative Data Mining
    Shah, Alpa Kavin
    Gulati, Ravi
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 233 - 240
  • [25] Privacy preserving sequential pattern mining based on data perturbation
    Ouyang, Wei-Min
    Xin, Hong-Liang
    Huang, Qin-Hua
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3239 - +
  • [26] Privacy Protection Practice for Data Mining with Multiple Data Sources: An Example with Data Clustering
    O'Shaughnessy, Pauline
    Lin, Yan-Xia
    MATHEMATICS, 2022, 10 (24)
  • [27] Using cryptography for privacy protection in data mining systems
    Zhan, Justin
    WEB INTELLIGENCE MEETS BRAIN INFORMATICS, 2007, 4845 : 494 - +
  • [28] Designing of Privacy Protection Platform Based on Data Mining
    Zhou Bing
    Zeng Zhihua
    PROCEEDINGS OF THE 2015 INTERNATIONAL INDUSTRIAL INFORMATICS AND COMPUTER ENGINEERING CONFERENCE, 2015, : 1251 - 1254
  • [29] A privacy data protection algorithm for mining association rules
    Zhu, Yuquan
    Sun, Chao
    Chen, Geng
    Journal of Computational Information Systems, 2010, 6 (10): : 3345 - 3352
  • [30] Random projection data perturbation based privacy protection in WSNs
    Ming, Zhao
    Zheng-Jiang, Wu
    Liu, Hui
    2017 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2017, : 493 - 498