Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

被引:211
|
作者
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bagging; binary classification; boosting; class imbalance; class noise; SMOTE;
D O I
10.1109/TSMCA.2010.2084081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.
引用
收藏
页码:552 / 568
页数:17
相关论文
共 50 条
  • [21] An exploration of learning when data is noisy and imbalanced
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    INTELLIGENT DATA ANALYSIS, 2011, 15 (02) : 215 - 236
  • [22] Boosted RVM algorithm for imbalanced and noisy data
    Qin, Wangchen
    Tong, Mi
    Liu, Fang
    Qi, Quan
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 151 - 155
  • [23] Knowledge discovery from imbalanced and noisy data
    Van Hulse, Jason
    Khoshgoftaar, Taghi
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (12) : 1513 - 1542
  • [24] A boosting method to detect noisy data
    Liu, XD
    Shi, CY
    Gu, XD
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2015 - 2020
  • [25] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [26] Using boosting tree to learn imbalanced data
    Yang Ridong
    Zhang Shiyu
    Li Lin
    Wang Zhe
    Zhou Yi
    The Journal of China Universities of Posts and Telecommunications, 2019, 26 (02) : 43 - 51
  • [27] A review of boosting methods for imbalanced data classification
    Li, Qiujie
    Mao, Yaobin
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 679 - 693
  • [28] An Imbalanced Data Classification Algorithm Based on Boosting
    Li Qiu-Jie
    Mao Yao-Bin
    Wang Zhi-Quan
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 3053 - 3057
  • [29] A review of boosting methods for imbalanced data classification
    Qiujie Li
    Yaobin Mao
    Pattern Analysis and Applications, 2014, 17 : 679 - 693
  • [30] A New Improved Boosting for Imbalanced Data Classification
    Zhang, Zongtang
    Qiu, JiaXing
    Dai, Weiguo
    2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533