Active Learning Method for Imbalanced Concept Drift Data Stream

被引:0
|
作者
Li Y.-H. [1 ,2 ]
Wang T.-T. [1 ,2 ]
Wang S.-G. [1 ,2 ]
Li D.-Y. [1 ,2 ]
机构
[1] School of Computer and Information Technology, Shanxi University, Taiyuan
[2] Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan
来源
基金
中国国家自然科学基金;
关键词
active learning; concept drift; Data stream classification; multi-class imbalance;
D O I
10.16383/j.aas.c230233
中图分类号
学科分类号
摘要
Data stream classification researchs how to provide more reliable data-driven prediction models in open and dynamic environment. The key is how to detect and adapt to concept drift from continuously changing data stream that arrive in real-time. Currently, in order to detect concept drift and update classification models, data stream classification methods usually assume that the labels of all samples are known, which is unrealistic in real scenarios. Additionally, real data stream may exhibit a high and constantly changing class imbalance ratios, further increasing the complexity of the data stream classification task. In this paper, we propose an active learning method for imbalanced concept drift data stream (ALM-ICDDS). Firstly, we define a sample prediction certainty measure based on multiple prediction probabilities and propose an adaptive adjustment method for the margin threshold matrix, which makes the label query strategy suitable for imbalanced data stream with a number of categories. Then, we propose a sample replacement strategy based on memory strength, which saves the samples that are difficult-to-distinguish, minority class and represent the current data distribution in the memory window, and improves the classification performance of new base classifier. Finally, we define the importance evaluation and update method of base classifier based on classification accuracy, which realizes the ensemble classifier update after drift. Comparative experiments on seven synthetic data streams and three real data streams show that the active learning method for imbalance concept drift data stream is better than six concept drift data stream learning methods in classification performance. © 2024 Science Press. All rights reserved.
引用
收藏
页码:589 / 606
页数:17
相关论文
共 40 条
  • [1] Liao G, Zhang P, Yin H, Luo T, Lin J., A novel semi-supervised classification approach for evolving data streams, Expert Systems With Applications, 215, (2023)
  • [2] Zhu Fei, Zhang Xu-Yao, Liu Cheng-Lin, Class incremental learning: A review and performance evaluation, Acta Automatica Sinica, 49, 3, pp. 1-26, (2023)
  • [3] Zhou Z H., Open-environment machine learning, National Science Review, 9, 8, pp. 211-221, (2022)
  • [4] Wang P, Jin N, Woo W L, Woodward J R, Davies D., Noise tolerant drift detection method for data stream mining, Information Sciences, 609, pp. 1318-1333, (2022)
  • [5] Yu H, Liu W, Lu J, Wen Y, Luo X, Zhang G., Detecting group concept drift from multiple data streams, Pattern Recognition, 134, (2023)
  • [6] Suarez-Cetrulo A L, Quintana D, Cervantes A., A survey on machine learning for recurring concept drifting data streams, Expert Systems With Applications, 213, (2022)
  • [7] Yang L, Shami A., A lightweight concept drift detection and adaptation framework for IoT data streams, IEEE Internet of Things Magazine, 4, 2, (2021)
  • [8] Bayram F, Ahmed B S, Kassler A., From concept drift to model degradation: An overview on performance-aware drift detectors, Knowledge-Based Systems, 245, (2022)
  • [9] Karimian M, Beigy H., Concept drift handling: A domain adaptation perspective, Expert Systems With Applications, 224, (2023)
  • [10] Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G., Learning under concept drift: A review, IEEE Transactions on Knowledge and Data Engineering, 31, 12, pp. 2346-2363, (2018)