Variance Feedback Drift Detection Method for Evolving Data Streams Mining

被引:0
|
作者
Han, Meng [1 ,2 ]
Meng, Fanxing [1 ]
Li, Chunpeng [1 ]
机构
[1] North Minzu Univ, Sch Comp Sci & Engn, Yinchuan 750021, Peoples R China
[2] North Minzu Univ, Key Lab Images & Graph Intelligent Proc State Ethn, Yinchuan 750021, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
基金
中国国家自然科学基金;
关键词
concept drift; variance; data stream; classification; statistical test; ONLINE;
D O I
10.3390/app14167157
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Learning from changing data streams is one of the important tasks of data mining. The phenomenon of the underlying distribution of data streams changing over time is called concept drift. In classification decision-making, the occurrence of concept drift will greatly affect the classification efficiency of the original classifier, that is, the old decision-making model is not suitable for the new data environment. Therefore, dealing with concept drift from changing data streams is crucial to guarantee classifier performance. Currently, most concept drift detection methods apply the same detection strategy to different data streams, with little attention to the uniqueness of each data stream. This limits the adaptability of drift detectors to different environments. In our research, we designed a unique solution to address this issue. First, we proposed a variance estimation strategy and a variance feedback strategy to characterize the data stream's characteristics through variance. Based on this variance, we developed personalized drift detection schemes for different data streams, thereby enhancing the adaptability of drift detection in various environments. We conducted experiments on data streams with various types of drifts. The experimental results show that our algorithm achieves the best average ranking for accuracy on the synthetic dataset, with an overall ranking 1.12 to 1.5 higher than the next-best algorithm. In comparison with algorithms using the same tests, our method improves the ranking by 3 to 3.5 for the Hoeffding test and by 1.12 to 2.25 for the McDiarmid test. In addition, they achieve a good balance between detection delay and false positive rates. Finally, our algorithm ranks higher than existing drift detection methods across the four key metrics of accuracy, CPU time, false positives, and detection delay, meeting our expectations.
引用
收藏
页数:29
相关论文
共 50 条
  • [41] An Active Learning Method for Data Streams with Concept Drift
    Park, Cheong Hee
    Kang, Youngsoon
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 746 - 752
  • [42] Mining distributed evolving data streams using fractal GP ensembles
    Folino, Gianluigi
    Pizzuti, Clara
    Spezzano, Giandomenico
    GENETIC PROGRAMMING, PROCEEDINGS, 2007, 4445 : 160 - +
  • [43] A Change Detector for Mining Frequent Patterns over Evolving Data Streams
    Ng, Willie
    Dash, Manoranjan
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 2406 - +
  • [44] Most preferable combination of explicit drift detection approaches with different classifiers for mining concept drifting data streams
    Srivastava, Ritesh
    Mittal, Veena
    International Journal of Data Science, 2019, 4 (03) : 196 - 214
  • [45] On learning guarantees to unsupervised concept drift detection on data streams
    de Mello, Rodrigo F.
    Vaz, Yule
    Grossi, Carlos H.
    Bifet, Albert
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 90 - 102
  • [46] Nacre: Proactive Recurrent Concept Drift Detection in Data Streams
    Wu, Ocean
    Koh, Yun Sing
    Dobbie, Gillian
    Lacombe, Thomas
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [47] Complexity-based drift detection for nonstationary data streams
    Komorniczak, Joanna
    Ksieniewicz, Pawel
    NEUROCOMPUTING, 2023, 552
  • [48] Unsupervised Drift Detection on High-speed Data Streams
    Souza, Vinicius M. A.
    Chowdhury, Farhan A.
    Mueen, Abdullah
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 102 - 111
  • [49] Data Streams Oriented Outlier Detection Method: A Fast Minimal Infrequent Pattern Mining
    Zhou, ZhongYu
    Pi, DeChang
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (06) : 864 - 870
  • [50] Statistical Drift Detection Ensemble for batch processing of data streams
    Komorniczak, Joanna
    Zyblewski, Pawel
    Ksieniewicz, Pawel
    KNOWLEDGE-BASED SYSTEMS, 2022, 252