Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

被引:211
|
作者
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bagging; binary classification; boosting; class imbalance; class noise; SMOTE;
D O I
10.1109/TSMCA.2010.2084081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.
引用
收藏
页码:552 / 568
页数:17
相关论文
共 50 条
  • [41] Comparing pure parallel ensemble creation techniques against bagging
    Hall, LO
    Bowyer, KW
    Banfield, RE
    Bhadoria, D
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 533 - 536
  • [42] AveBoost2: Boosting for noisy data
    Oza, NC
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2004, 3077 : 31 - 40
  • [43] Tackling Overfitting in Boosting for Noisy Healthcare Data
    Park, Yubin
    Ho, Joyce C.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (07) : 2995 - 3006
  • [44] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
  • [45] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2008, 4994 : 38 - 47
  • [46] MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification
    Rayhan, Farshid
    Ahmed, Sajid
    Mahbub, Asif
    Jani, Md. Rafsan
    Shatabda, Swakkhar
    Farid, Dewan Md.
    Rahman, Chowdhury Mofizur
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [47] Boosting support vector machines for imbalanced data sets
    Benjamin X. Wang
    Nathalie Japkowicz
    Knowledge and Information Systems, 2010, 25 : 1 - 20
  • [48] Boosting Mobile Apps under Imbalanced Sensing Data
    Zhang, Xinglin
    Yang, Zheng
    Shangguan, Longfei
    Liu, Yunhao
    Chen, Lei
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2015, 14 (06) : 1151 - 1161
  • [49] Boosting imbalanced data learning with Wiener process oversampling
    Qian Li
    Gang Li
    Wenjia Niu
    Yanan Cao
    Liang Chang
    Jianlong Tan
    Li Guo
    Frontiers of Computer Science, 2017, 11 : 836 - 851
  • [50] Boosting imbalanced data learning with Wiener process oversampling
    Li, Qian
    Li, Gang
    Niu, Wenjia
    Cao, Yanan
    Chang, Liang
    Tan, Jianlong
    Guo, Li
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 836 - 851