Efficient approximation and privacy preservation algorithms for real time online evolving data streams

被引:4
|
作者
Patil, Rahul A. [1 ,2 ]
Patil, Pramod D. [1 ]
机构
[1] Dr D Y Patil Inst Technol, Pimpri Pune 411018, Maharashtra, India
[2] Pimpri Chinchwad Coll Engn, Pune 411044, Maharashtra, India
关键词
Approximation; Data streaming; Clustering; k-anonymization; l-diversity; Privacy preservation; ANONYMIZATION;
D O I
10.1007/s11280-024-01244-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of the processing of continuous unstructured large streams of data, mining real-time streaming data is a more challenging research issue than mining static data. The privacy issue persists when sensitive data is included in streaming data. In recent years, there has been significant progress in research on the anonymization of static data. For the anonymization of quasi-identifiers, two typical strategies are generalization and suppression. However, the high dynamicity and potential infinite properties of the streaming data make it a challenging task. To end this, we propose a novel Efficient Approximation and Privacy Preservation Algorithms (EAPPA) framework in this paper to achieve efficient data pre-processing from the live streaming and its privacy preservation with minimum Information Loss (IL) and computational requirements. As the existing privacy preservation solutions for streaming data suffer from the challenges of redundant data, we first propose the efficient technique of data approximation with data pre-processing. We design the Flajolet Martin (FM) algorithm for robust and efficient approximation of unique elements in the data stream with a data cleaning mechanism. We fed the periodically approximated and pre-processed streaming data to the anonymization algorithm. Using adaptive clustering, we propose innovative k-anonymization and l-diversity privacy principles for data streams. The proposed approach scans a stream to detect and reuse clusters that fulfill the k-anonymity and l-diversity criteria for reducing anonymization time and IL. The experimental results reveal the efficiency of the EAPPA framework compared to state-of-art methods.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] WIP: Towards Optimal Online Approximation of Data Streams
    Sitbon, Phillip
    Bulusu, Nirupama
    Feng, Wu-chi
    2011 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS AND WORKSHOPS (DCOSS), 2011,
  • [22] Privacy-Preserving for Dynamic Real-Time Published Data Streams Based on Local Differential Privacy
    Gao, Wen
    Zhou, Siwang
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (08): : 13551 - 13562
  • [23] Online Evaluation of Patterns from Evolving Web Data Streams
    Rojas, Carlos
    Nasraoui, Olfa
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2009, : 315 - 318
  • [24] Classification of high-dimensional evolving data streams via a resource-efficient online ensemble
    Zhai, Tingting
    Gao, Yang
    Wang, Hao
    Cao, Longbing
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (05) : 1242 - 1265
  • [25] Classification of high-dimensional evolving data streams via a resource-efficient online ensemble
    Tingting Zhai
    Yang Gao
    Hao Wang
    Longbing Cao
    Data Mining and Knowledge Discovery, 2017, 31 : 1242 - 1265
  • [26] Efficient privacy preservation of big data for accurate data mining
    Chamikara, M. A. P.
    Bertok, P.
    Liu, D.
    Camtepe, S.
    Khalil, I
    INFORMATION SCIENCES, 2020, 527 : 420 - 443
  • [27] Accelerated Real-Time Classification of Evolving Data Streams using Adaptive Random Forests
    Ridder, Frank
    Chen, Kuan-Hsun
    Alachiotis, Nikolaos
    2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 232 - 237
  • [28] An efficient and scalable privacy preserving algorithm for big data and data streams
    Chamikara, M. A. P.
    Bertok, P.
    Liu, D.
    Camtepe, S.
    Khalil, I
    COMPUTERS & SECURITY, 2019, 87
  • [29] Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees
    Rajagopalan, Anand
    Vitale, Fabio
    Vainstein, Danny
    Citovsky, Gui
    Procopiuc, Cecilia M.
    Gentile, Claudio
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [30] Approximation Algorithms for Massive High-Rate Data Streams
    Cuzzocrea, Alfredo
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, 2013, 185 : 59 - 68