Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams

被引:0
|
作者
Korycki, Lukasz [1 ]
Cano, Alberto [1 ]
Krawczyk, Bartosz [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
machine learning; data stream mining; imbalanced data; active learning; ensemble learning; RESAMPLING ENSEMBLE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.
引用
下载
收藏
页码:2334 / 2343
页数:10
相关论文
共 50 条
  • [1] Online ensemble learning with abstaining classifiers for drifting and noisy data streams
    Krawczyk, Bartosz
    Cano, Alberto
    APPLIED SOFT COMPUTING, 2018, 68 : 677 - 692
  • [2] Reinforcement Online Active Learning Ensemble for Drifting Imbalanced Data Streams
    Zhang, Hang
    Liu, Weike
    Liu, Qingbao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3971 - 3983
  • [3] Online Active Learning for Drifting Data Streams
    Liu, Sanmin
    Xue, Shan
    Wu, Jia
    Zhou, Chuan
    Yang, Jian
    Li, Zhao
    Cao, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 186 - 200
  • [4] A Benchmark of Classifiers on Feature Drifting Data Streams
    Barddal, Jean Paul
    Gomes, Heitor Murilo
    Britto, Alceu de Souza, Jr.
    Enembreck, Fabricio
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2180 - 2185
  • [5] Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams
    Lichtenwalter, Ryan N.
    Chawla, Nitesh V.
    NEW FRONTIERS IN APPLIED DATA MINING, 2010, 5669 : 53 - 75
  • [6] Online Query by Committee for Active Learning from Drifting Data Streams
    Krawczyk, Bartosz
    Wozniak, Michal
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2120 - 2127
  • [7] EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams
    Usman, Muhammad
    Chen, Huanhuan
    NEUROCOMPUTING, 2024, 605
  • [8] Learning from Imbalanced Data Streams Using Rotation-Based Ensemble Classifiers
    Czarnowski, Ireneusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023, 2023, 14162 : 794 - 805
  • [9] ADAPTIVE DATA REUSE FOR CLASSIFYING IMBALANCED AND CONCEPT-DRIFTING DATA STREAMS
    Nguyen, Hien M.
    Cooper, Eric W.
    Kamei, Katsuari
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (7B): : 4995 - 5010
  • [10] The impact of data difficulty factors on classification of imbalanced and concept drifting data streams
    Dariusz Brzezinski
    Leandro L. Minku
    Tomasz Pewinski
    Jerzy Stefanowski
    Artur Szumaczuk
    Knowledge and Information Systems, 2021, 63 : 1429 - 1469