A hybrid approach for scalable sub-tree anonymization over big data using Map Reduce on cloud

被引:72
|
作者
Zhang, Xuyun [1 ]
Liu, Chang [1 ]
Nepal, Surya [2 ]
Yang, Chi [1 ]
Dou, Wanchun [3 ]
Chen, Jinjun [1 ]
机构
[1] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
[2] CSIRO, Ctr Informat & Commun Technol, Marsfield, NSW 2122, Australia
[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
关键词
Big data; Cloud computing; Data anonymization; Privacy preservation; MapReduce;
D O I
10.1016/j.jcss.2014.02.007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:1008 / 1020
页数:13
相关论文
共 50 条
  • [31] Big Data Analytics using Hadoop Map Reduce Framework and Data Migration Process
    Bante, Payal M.
    Rajeswari, K.
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [32] Enhancing Cloud Data Privacy with a Scalable Hybrid Approach: HE-DPSMC
    Singh, Jaibir
    MallaReddy, A.
    Bande, Vasavi
    Lakshmanarao, A.
    Rao, Goda Srinivasa
    Samunnisa, K.
    JOURNAL OF ELECTRICAL SYSTEMS, 2023, 19 (04) : 350 - 375
  • [33] A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud
    Yang, Chi
    Chen, Jinjun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (06) : 1144 - 1157
  • [34] Programmers' de-anonymization using a hybrid approach of abstract syntax tree and deep learning
    Ullah, Farhan
    Jabbar, Sohail
    Al-Turjman, Fadi
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2020, 159
  • [35] Subgroup discovery on Big Data: exhaustive methodologies using Map-Reduce
    Padillo, F.
    Luna, J. M.
    Ventura, S.
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1684 - 1691
  • [36] A hybrid optimization approach using Evolutionary Computing and Map Reduce Architecture
    Lohani, Bhanu Prakash
    Singh, Ajit
    Bibhu, Vimal
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [37] Scalable preference queries for high-dimensional data using map-reduce
    Guzun, Gheorghi
    Tosado, Joel E.
    Canahuate, Guadalupe
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2243 - 2252
  • [38] Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce
    Clemente-Castello, Francisco J.
    Nicolae, Bogdan
    Katrinis, Kostas
    Rafique, M. Mustafa
    Mayo, Rafael
    Carlos Fernandez, Juan
    Loreti, Daniela
    2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 290 - 299
  • [39] Memory Scaling of Cloud-Based Big Data Systems: A Hybrid Approach
    Wang, Xinying
    Xu, Cong
    Wang, Ke
    Yan, Feng
    Zhao, Dongfang
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (05) : 1259 - 1272
  • [40] An hxperimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and Map Reduce.
    Pal, Amrit
    Agrawal, Sanjay
    2014 FIRST INTERNATIONAL CONFERENCE ON NETWORKS & SOFT COMPUTING (ICNSC), 2014, : 442 - 447