Anonymizing classification data for privacy preservation

被引:175
|
作者
Fung, Benjamin C. M. [1 ]
Wang, Ke
Yu, Philip S.
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[2] IBM Corp, TJ Watson Res Ctr, Hawthorne, NY 10532 USA
基金
加拿大自然科学与工程研究理事会;
关键词
privacy protection; anonymity; security; integrity; data mining; classification; data sharing;
D O I
10.1109/TKDE.2007.1015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual's privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of nonidentifying attributes such as {Sex; Zip; Birthdate}. A useful approach to combat such linking attacks, called k-anonymization [1], is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the "future" data. In this paper, we propose a k-anonymization solution for classification. Our goal is to find a k-anonymization, not necessarily optimal in the sense of minimizing data distortion, which preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real-life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements.
引用
收藏
页码:711 / 725
页数:15
相关论文
共 50 条
  • [1] Anonymizing Classification Data for Preserving Privacy
    Chettri, Sarat Kr.
    Borah, B.
    [J]. SECURITY IN COMPUTING AND COMMUNICATIONS (SSCC 2015), 2015, 536 : 99 - 109
  • [2] Anonymizing streaming data for privacy protection
    Li, Jianzhong
    Ooi, Beng Chin
    Wang, Weiping
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1367 - +
  • [3] Data Quality in Privacy Preservation for Associative Classification
    Harnsamut, Nattapon
    Natwichai, Juggapong
    Sun, Xingzhi
    Li, Xue
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 111 - +
  • [4] Privacy Preservation by k-anonymizing Ngrams of Time series
    Zare-Mirakabad, Mohammad-Reza
    Kaveh-Yazdy, Fatemeh
    Tahmasebi, Mohammad
    [J]. 2013 10TH INTERNATIONAL ISC CONFERENCE ON INFORMATION SECURITY AND CRYPTOLOGY (ISCISC), 2013,
  • [5] Hiding classification rules for data sharing with privacy preservation
    Natwichai, J
    Li, X
    Orlowska, M
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2005, 3589 : 468 - 477
  • [6] Discrimination Prevention with Classification and Privacy Preservation in Data mining
    KumarTripathi, Krishna
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING AND VIRTUALIZATION (ICCCV) 2016, 2016, 79 : 244 - 253
  • [7] PRIVACY PRESERVATION FOR ASSOCIATIVE CLASSIFICATION
    Harnsamut, Nattapon
    Natwichai, Juggapong
    Sun, Xingzhi
    Li, Xue
    [J]. COMPUTATIONAL INTELLIGENCE, 2014, 30 (04) : 752 - 770
  • [8] Embracing Differential Privacy for Anonymizing Spontaneous ADE Reporting Data
    Lin, Wen-Yang
    Shen, Zhi-Xun
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2015 - 2022
  • [9] Anonymizing classification data using rough set theory
    Ye, Mingquan
    Wu, Xindong
    Hu, Xuegang
    Hu, Donghui
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 43 : 82 - 94
  • [10] Anonymizing Healthcare Records: A Study of Privacy Preserving Data Publishing Techniques
    Jayabalan, Manoj
    Rana, Muhammad Ehsan
    [J]. ADVANCED SCIENCE LETTERS, 2018, 24 (03) : 1694 - 1697