Efficient approximation and privacy preservation algorithms for real time online evolving data streams

被引:4
|
作者
Patil, Rahul A. [1 ,2 ]
Patil, Pramod D. [1 ]
机构
[1] Dr D Y Patil Inst Technol, Pimpri Pune 411018, Maharashtra, India
[2] Pimpri Chinchwad Coll Engn, Pune 411044, Maharashtra, India
关键词
Approximation; Data streaming; Clustering; k-anonymization; l-diversity; Privacy preservation; ANONYMIZATION;
D O I
10.1007/s11280-024-01244-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of the processing of continuous unstructured large streams of data, mining real-time streaming data is a more challenging research issue than mining static data. The privacy issue persists when sensitive data is included in streaming data. In recent years, there has been significant progress in research on the anonymization of static data. For the anonymization of quasi-identifiers, two typical strategies are generalization and suppression. However, the high dynamicity and potential infinite properties of the streaming data make it a challenging task. To end this, we propose a novel Efficient Approximation and Privacy Preservation Algorithms (EAPPA) framework in this paper to achieve efficient data pre-processing from the live streaming and its privacy preservation with minimum Information Loss (IL) and computational requirements. As the existing privacy preservation solutions for streaming data suffer from the challenges of redundant data, we first propose the efficient technique of data approximation with data pre-processing. We design the Flajolet Martin (FM) algorithm for robust and efficient approximation of unique elements in the data stream with a data cleaning mechanism. We fed the periodically approximated and pre-processed streaming data to the anonymization algorithm. Using adaptive clustering, we propose innovative k-anonymization and l-diversity privacy principles for data streams. The proposed approach scans a stream to detect and reuse clusters that fulfill the k-anonymity and l-diversity criteria for reducing anonymization time and IL. The experimental results reveal the efficiency of the EAPPA framework compared to state-of-art methods.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] A Novel Online Real-time Classifier for Multi-label Data Streams
    Venkatesan, Rajasekar
    Er, Meng Joo
    Wu, Shiqian
    Pratama, Mahardhika
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1833 - 1840
  • [32] MovStream: An Efficient Algorithm for Monitoring Clusters Evolving in Data Streams
    Tang, Liang
    Tang, Chang-jie
    Duan, Lei
    Li, Chuan
    Jiang, Ye-xi
    Zeng, Chun-qiu
    Zhu, Jun
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 582 - +
  • [33] PatHT: An Efficient Method of Classification over Evolving Data Streams
    Han, Meng
    Ding, Jian
    Li, Juan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (06) : 1098 - 1105
  • [34] Meta Expert Learning and Efficient Pruning for Evolving Data Streams
    Azarafrooz, Mahdi
    Daneshmand, Mahmoud
    IEEE INTERNET OF THINGS JOURNAL, 2015, 2 (04): : 268 - 273
  • [35] A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams
    Bahri, Maroua
    Maniu, Silviu
    Bifet, Albert
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 604 - 613
  • [36] Efficient online subsequence searching in data streams under Dynamic Time Warping distance
    Zhou, Mi
    Wong, Man Hon
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 686 - +
  • [37] Online reliable semi-supervised learning on evolving data streams
    Din, Salah Ud
    Shao, Junming
    Kumar, Jay
    Ali, Waqar
    Liu, Jiaming
    Ye, Yu
    INFORMATION SCIENCES, 2020, 525 (525) : 153 - 171
  • [38] Online GRNN-Based Ensembles for Regression on Evolving Data Streams
    Duda, Piotr
    Jaworski, Maciej
    Rutkowski, Leszek
    ADVANCES IN NEURAL NETWORKS - ISNN 2018, 2018, 10878 : 221 - 228
  • [39] CPOCEDS-concept preserving online clustering for evolving data streams
    Jafseer, K. T.
    Shailesh, S.
    Sreekumar, A.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2983 - 2998
  • [40] Fully online clustering of evolving data streams into arbitrarily shaped clusters
    Hyde, Richard
    Angelov, Plamen
    MacKenzie, A. R.
    INFORMATION SCIENCES, 2017, 382 : 96 - 114