Neighbourhood sampling in bagging for imbalanced data

被引:135
|
作者
Blaszczynski, Jerzy [1 ]
Stefanowski, Jerzy [1 ]
机构
[1] Poznan Univ Tech, Inst Comp Sci, PL-60965 Poznan, Poland
关键词
Class imbalance; Ensemble classifiers; Bagging; IDENTIFICATION; CLASSIFICATION;
D O I
10.1016/j.neucom.2014.07.064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex and difficult distribution of the minority class can be handled by analyzing the content of a neighbourhood of examples. In our study we show that taking into account such local characteristics of the minority class distribution can be useful both for analyzing performance of ensembles with respect to data difficulty factors and for proposing new generalizations of bagging. We demonstrate it by proposing Neighbourhood Balanced Bagging, where sampling probabilities of examples are modified according to the class distribution in their neighbourhood. Two of its versions are considered: the first one keeping a larger size of bootstrap samples by hybrid over-sampling and the other reducing this size with stronger under-sampling. Experiments prove that the first version is significantly better than existing over-sampling bagging extensions while the other version is competitive to Roughly Balanced Bagging. Finally, we demonstrate that detecting types of minority examples depending on their neighbourhood may help explain why some ensembles work better for imbalanced data than others. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:529 / 542
页数:14
相关论文
共 50 条
  • [1] Diversity Analysis on Imbalanced Data Using Neighbourhood and Roughly Balanced Bagging Ensembles
    Blaszczynski, Jerzy
    Lango, Mateusz
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2016, 2016, 9692 : 552 - 562
  • [2] Extending Bagging for Imbalanced Data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    Idkowiak, Lukasz
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2013, 2013, 226 : 269 - 278
  • [3] Deterministic Sampling Classifier with weighted Bagging for drifted imbalanced data stream classification
    Klikowski, Jakub
    Wozniak, Michal
    [J]. APPLIED SOFT COMPUTING, 2022, 122
  • [4] Actively Balanced Bagging for Imbalanced Data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 271 - 281
  • [5] Lazy bagging for classifying imbalanced data
    Zhu, Xingquan
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 763 - 768
  • [6] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    [J]. Frontiers of Computer Science, 2018, 12 : 331 - 350
  • [7] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [8] Abstaining in rule set bagging for imbalanced data
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. LOGIC JOURNAL OF THE IGPL, 2015, 23 (03) : 421 - 430
  • [9] Online Bagging and Boosting for Imbalanced Data Streams
    Wang, Boyu
    Pineau, Joelle
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3353 - 3366
  • [10] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568