Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

被引:211
|
作者
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Bagging; binary classification; boosting; class imbalance; class noise; SMOTE;
D O I
10.1109/TSMCA.2010.2084081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.
引用
收藏
页码:552 / 568
页数:17
相关论文
共 50 条
  • [31] Using boosting tree to learn imbalanced data
    Ridong Y.
    Shiyu Z.
    Lin L.
    Zhe W.
    Yi Z.
    Journal of China Universities of Posts and Telecommunications, 2019, 26 (02): : 43 - 51
  • [32] Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting
    Winata, Genta Indra
    Khodra, Masayu Leylia
    5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 500 - 505
  • [33] Handling Imbalanced Data for Real-Time Crash Prediction: Application of Boosting and Sampling Techniques
    Ariannezhad, Amin
    Karimpour, Abolfazl
    Qin, Xiao
    Wu, Yao-Jan
    Salmani, Yasamin
    JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2021, 147 (03)
  • [34] Multiple boosting: A combination of boosting and bagging
    Zheng, ZJ
    Webb, GI
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1133 - 1140
  • [35] INCORPORATING BAGGING INTO BOOSTING
    Jain, Kavita
    Kulkarni, Sushil
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 443 - 448
  • [36] Online bagging and boosting
    Oza, NC
    INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOL 1-4, PROCEEDINGS, 2005, : 2340 - 2345
  • [37] Mammographic Classification Using Stacked Ensemble Learning with Bagging and Boosting Techniques
    Nirase Fathima Abubacker
    Ibrahim Abaker Targio Hashem
    Lim Kun Hui
    Journal of Medical and Biological Engineering, 2020, 40 : 908 - 916
  • [38] Mammographic Classification Using Stacked Ensemble Learning with Bagging and Boosting Techniques
    Abubacker, Nirase Fathima
    Hashem, Ibrahim Abaker Targio
    Hui, Lim Kun
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2020, 40 (06) : 908 - 916
  • [39] Modifications of Classification Strategies in Rule Set Based Bagging for Imbalanced Data
    Napierala, Krystyna
    Stefanowski, Jerzy
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 514 - 525
  • [40] Robust Thresholding Strategies for Highly Imbalanced and Noisy Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1182 - 1188