Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

被引:211
|
作者
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bagging; binary classification; boosting; class imbalance; class noise; SMOTE;
D O I
10.1109/TSMCA.2010.2084081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.
引用
收藏
页码:552 / 568
页数:17
相关论文
共 50 条
  • [1] Online Bagging and Boosting for Imbalanced Data Streams
    Wang, Boyu
    Pineau, Joelle
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3353 - 3366
  • [2] Using Boosting and Clustering to Prune Bagging and Detect Noisy Data
    Xie, Yuan-Cheng
    Yang, Jing-Yu
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 83 - 87
  • [4] Extending Bagging for Imbalanced Data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    Idkowiak, Lukasz
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2013, 2013, 226 : 269 - 278
  • [5] Neighbourhood sampling in bagging for imbalanced data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    NEUROCOMPUTING, 2015, 150 : 529 - 542
  • [6] Actively Balanced Bagging for Imbalanced Data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 271 - 281
  • [7] Comparing Boosting and Bagging for Decision Trees of Rankings
    Antonella Plaia
    Simona Buscemi
    Johannes Fürnkranz
    Eneldo Loza Mencía
    Journal of Classification, 2022, 39 : 78 - 99
  • [8] Comparing Boosting and Bagging for Decision Trees of Rankings
    Plaia, Antonella
    Buscemi, Simona
    Fuernkranz, Johannes
    Mencia, Eneldo Loza
    JOURNAL OF CLASSIFICATION, 2022, 39 (01) : 78 - 99
  • [9] Lazy bagging for classifying imbalanced data
    Zhu, Xingquan
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 763 - 768
  • [10] Bagging and boosting techniques in prediction of particulate matters
    Triana, D.
    Osowski, S.
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2020, 68 (05) : 1207 - 1215