Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams

被引:8
|
作者
Krawczyk, Bartosz [1 ]
Skryjomski, Przemyslaw [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
Machine learning; Data streams; Imbalanced data Concept drift; Online learning; Multi-class imbalance;
D O I
10.1007/978-3-319-71246-8_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining streaming and drifting data is among the most popular contemporary applications of machine learning methods. Due to the potentially unbounded number of instances arriving rapidly, evolving concepts and limitations imposed on utilized computational resources, there is a need to develop efficient and adaptive algorithms that can handle such problems. These learning difficulties can be further augmented by appearance of skewed distributions during the stream progress. Class imbalance in non-stationary scenarios is highly challenging, as not only imbalance ratio may change over time, but also relationships among classes. In this paper we propose an efficient and fast cost-sensitive decision tree learning scheme for handling online class imbalance. In each leaf of the tree we train a perceptron with output adaptation to compensate for skewed class distributions, while McDiarmid's bound is used for controlling the splitting attribute selection. The cost matrix automatically adapts itself to the current imbalance ratio in the stream, allowing for a smooth compensation of evolving class relationships. Furthermore, we analyze characteristics of minority class instances and incorporate this information during the model update process. It allows our classifier to focus on most difficult instances, while a sliding window keeps track of changes in class structures. Experimental analysis carried out on a number of binary and multi-class imbalanced data streams indicate the usefulness of the proposed approach.
引用
收藏
页码:512 / 527
页数:16
相关论文
共 50 条
  • [1] Cost-sensitive learning for imbalanced data streams
    Loezer, Lucas
    Enembreck, Fabricio
    Barddal, Jean Paul
    Britto Jr, Alceu de Souza
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 498 - 504
  • [2] Cost-sensitive multi-layer perceptron for binary classification with imbalanced data
    Liu, Zheng
    Zhang, Sen
    Xiao, Wendong
    Di, Yan
    [J]. 2018 37TH CHINESE CONTROL CONFERENCE (CCC), 2018, : 9614 - 9619
  • [3] Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data
    Castro, Cristiano L.
    Braga, Antonio P.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) : 888 - 899
  • [4] Cost-sensitive sparse group online learning for imbalanced data streams
    Chen, Zhong
    Sheng, Victor
    Edwards, Andrea
    Zhang, Kun
    [J]. MACHINE LEARNING, 2024, 113 (07) : 4407 - 4444
  • [5] Cost-sensitive decision trees applied to medical data
    Freitas, Alberto
    Costa-Pereira, Altamiro
    Brazdil, Pavel
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2007, 4654 : 303 - +
  • [6] Cost-Sensitive Learning Methods for Imbalanced Data
    Nguyen Thai-Nghe
    Gantner, Zeno
    Schmidt-Thieme, Lars
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [7] Cost-sensitive boosting for classification of imbalanced data
    Sun, Yamnin
    Kamel, Mohamed S.
    Wong, Andrew K. C.
    Wang, Yang
    [J]. PATTERN RECOGNITION, 2007, 40 (12) : 3358 - 3378
  • [8] Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift
    Chen, Yingying
    Yang, Xiaowei
    Dai, Hong-Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [9] Cost-sensitive decision trees with multiple cost scales
    Qin, ZX
    Zhang, SC
    Zhang, CQ
    [J]. AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 380 - 390
  • [10] Evolutionary induction of cost-sensitive decision trees
    Kretowski, Marek
    Grzes, Marek
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 121 - 126