Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams

被引:0
|
作者
Korycki, Lukasz [1 ]
Cano, Alberto [1 ]
Krawczyk, Bartosz [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
machine learning; data stream mining; imbalanced data; active learning; ensemble learning; RESAMPLING ENSEMBLE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.
引用
收藏
页码:2334 / 2343
页数:10
相关论文
共 50 条
  • [31] Online Learning From Incomplete and Imbalanced Data Streams
    You, Dianlong
    Xiao, Jiawei
    Wang, Yang
    Yan, Huigui
    Wu, Di
    Chen, Zhen
    Shen, Limin
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10650 - 10665
  • [32] Online semi-supervised active learning ensemble classification for evolving imbalanced data streams
    Guo, Yinan
    Pu, Jiayang
    Jiao, Botao
    Peng, Yanyan
    Wang, Dini
    Yang, Shengxiang
    APPLIED SOFT COMPUTING, 2024, 155
  • [33] Imbalanced data issues in machine learning classifiers: a case study
    Gong, Mingxing
    JOURNAL OF OPERATIONAL RISK, 2022, 17 (04): : 17 - 36
  • [34] Transfer learning for concept drifting data streams in heterogeneous environments
    Moradi, Mona
    Rahmanimanesh, Mohammad
    Shahzadi, Ali
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 2799 - 2857
  • [35] A survey on machine learning for recurring concept drifting data streams
    Suarez-Cetrulo, Andres L.
    Quintana, David
    Cervantes, Alejandro
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [36] Transfer learning for concept drifting data streams in heterogeneous environments
    Mona Moradi
    Mohammad Rahmanimanesh
    Ali Shahzadi
    Knowledge and Information Systems, 2024, 66 : 2799 - 2857
  • [37] Pairwise Combination of Classifiers for Ensemble Learning on Data Streams
    Gomes, Heitor Murilo
    Barddal, Jean Paul
    Enembreck, Fabricio
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 941 - 946
  • [38] The Influence of Multiple Classes on Learning from Imbalanced Data Streams
    Lipska, Agnieszka
    Stefanowski, Jerzy
    FOURTH INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 183, 2022, 183 : 187 - 198
  • [39] Online Automated Machine Learning for Class Imbalanced Data Streams
    Wang, Zhaoyang
    Wang, Shuo
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [40] Online Asymmetric Active Learning with Imbalanced Data
    Zhang, Xiaoxuan
    Yang, Tianbao
    Srinivasan, Padmini
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2055 - 2064