Scalable Distributed Data Anonymization for Large Datasets

被引:1
|
作者
di Vimercati, Sabrina De Capitani [1 ]
Facchinetti, Dario [2 ]
Foresti, Sara [1 ]
Livraga, Giovanni [1 ]
Oldani, Gianluca
Paraboschi, Stefano [2 ]
Rossi, Matthew [2 ]
Samarati, Pierangela [1 ]
机构
[1] Univ Milan, I-20122 Milan, MI, Italy
[2] Univ Bergamo, I-24129 Bergamo, BG, Italy
关键词
Distributed data anonymization; mondrian; k-anonymity; l-diversity; apache spark; LOOSE ASSOCIATIONS; PRIVACY;
D O I
10.1109/TBDATA.2022.3207521
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-Anonymity and `-diversity are two well-known privacy metrics that guarantee protection of the respondents of a dataset by obfuscating information that can disclose their identities and sensitive information. Existing solutions for enforcing them implicitly assume to operate in a centralized scenario, since they require complete visibility over the dataset to be anonymized, and can therefore have limited applicability in anonymizing large datasets. In this article, we propose a solution that extends Mondrian (an efficient and effective approach designed for achieving k-anonymity) for enforcing both k-anonymity and `-diversity over large datasets in a distributed manner, leveraging the parallel computation of multiple workers. Our approach efficiently distributes the computation among the workers, without requiring visibility over the dataset in its entirety. Our data partitioning limits the need for workers to exchange data, so that each worker can independently anonymize a portion of the dataset. We implemented our approach providing parallel execution on a dynamically chosen number of workers. The experimental evaluation shows that our solution provides scalability, while not affecting the quality of the resulting anonymization.
引用
收藏
页码:818 / 831
页数:14
相关论文
共 50 条
  • [1] Scalable Distributed Data Anonymization
    di Vimercati, Sabrina De Capitani
    Facchinetti, Dario
    Foresti, Sara
    Oldani, Gianluca
    Paraboschi, Stefano
    Rossi, Matthew
    Samarati, Pierangela
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 401 - 403
  • [2] Artifact: Scalable Distributed Data Anonymization
    di Vimercati, Sabrina De Capitani
    Facchinetti, Dario
    Foresti, Sara
    Oldani, Gianluca
    Paraboschi, Stefano
    Rossi, Matthew
    Samarati, Pierangela
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 450 - 451
  • [3] Distributed Data Anonymization
    SheikhAlishahi, Mina
    Martinelli, Fabio
    [J]. IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 580 - 586
  • [4] Robust de-anonymization of large sparse datasets
    Narayanan, Arvind
    Shmatikov, Vitaly
    [J]. PROCEEDINGS OF THE 2008 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2008, : 111 - 125
  • [5] A Framework for Data Clustering of Large Datasets in a Distributed Environment
    Swapna, Ch. Swetha
    Kumar, V. Vijaya
    Murthy, J. V. R.
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 425 - 441
  • [6] A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets
    Mohamed, Heba
    Fathalla, Said
    Lehmann, Jens
    Jabeen, Hajira
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2021, : 51 - 60
  • [7] HMSPKmerCounter: Hadoop based Parallel, Scalable, Distributed Kmer Counter for Large Datasets
    Saravanan, S.
    Athri, Prashanth
    [J]. 2018 INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND SYSTEMS BIOLOGY (BSB), 2018, : 112 - 118
  • [8] A Parallel Method for Scalable Anonymization of Transaction Data
    Memon, Neelam
    Loukides, Grigorios
    Shao, Jianhua
    [J]. 2015 14TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2015, : 235 - 241
  • [9] A flexible approach to distributed data anonymization
    Kohlmayer, Florian
    Prasser, Fabian
    Eckert, Claudia
    Kuhn, Klaus A.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 50 : 62 - 76
  • [10] Classification of Datasets Used in Data Anonymization for IoT Environment
    Medkova, Jana
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, IEA-AIE 2024, 2024, 14748 : 80 - 92