A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

被引:1608
|
作者
Galar, Mikel [1 ]
Fernandez, Alberto [2 ]
Barrenechea, Edurne [1 ]
Bustince, Humberto [1 ]
Herrera, Francisco [3 ]
机构
[1] Univ Publ Navarra, Dept Automat & Computac, Navarra 31006, Spain
[2] Univ Jaen, Dept Comp Sci, Jaen 23071, Spain
[3] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
关键词
Bagging; boosting; class distribution; classification; ensembles; imbalanced data-sets; multiple classifier systems; SUPPORT VECTOR MACHINES; STATISTICAL COMPARISONS; NEURAL-NETWORKS; DECISION TREES; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; STRATEGIES; VARIANCE; ACCURACY;
D O I
10.1109/TSMCC.2011.2161285
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classifier learning with data-sets that suffer from imbalanced class distributions is a challenging problem in data mining community. This issue occurs when the number of examples that represent one class is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. In machine learning, the ensemble of classifiers are known to increase the accuracy of single classifiers by combining several of them, but neither of these learning techniques alone solve the class imbalance problem, to deal with this issue the ensemble learning algorithms have to be designed specifically. In this paper, our aim is to review the state of the art on ensemble techniques in the framework of imbalanced data-sets, with focus on two-class problems. We propose a taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based. In addition, we develop a thorough empirical comparison by the consideration of the most significant published approaches, within the families of the taxonomy proposed, to show whether any of them makes a difference. This comparison has shown the good behavior of the simplest approaches which combine random undersampling techniques with bagging or boosting ensembles. In addition, the positive synergy between sampling techniques and bagging has stood out. Furthermore, our results show empirically that ensemble-based algorithms are worthwhile since they outperform the mere use of preprocessing techniques before learning the classifier, therefore justifying the increase of complexity by means of a significant enhancement of the results.
引用
收藏
页码:463 / 484
页数:22
相关论文
共 8 条
  • [1] A Novel Hybrid-Based Ensemble for Class Imbalance Problem
    Guo, Huaping
    Zhou, Jun
    Wu, Chang-an
    She, Wei
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (06)
  • [2] Hybrid Ensembles of Decision Trees and Bayesian Network for Class Imbalance Problem
    Ruangthong, Pumitara
    Jaiyen, Saichon
    [J]. 2016 8TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2016, : 39 - 42
  • [3] A Review on Solution to Class Imbalance Problem: Undersampling Approaches
    Devi, Debashree
    Biswas, Saroj K.
    Purkayastha, Biswajit
    [J]. 2020 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2020), 2020, : 626 - 631
  • [4] Techniques Based Upon Boosting to Counter Class Imbalance Problem-A Survey
    Kaur, Prabhjot
    Negi, Vasu
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2620 - 2623
  • [5] Hybrid Approach Redefinition (HAR) model for optimizing hybrid ensembles in handling class imbalance: a review and research framework
    Hartono, Hartono
    Sitompul, Opim Salim
    Tulus, Tulus
    Nababan, Erna Budhiarti
    Napitupulu, Darmawan
    [J]. 3RD ANNUAL APPLIED SCIENCE AND ENGINEERING CONFERENCE (AASEC 2018), 2018, 197
  • [6] A Review of Fuzzy and Pattern-Based Approaches for Class Imbalance Problems
    Lin, Ismael
    Loyola-Gonzalez, Octavio
    Monroy, Raul
    Medina-Perez, Miguel Angel
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [7] A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem
    Devi, Debashree
    Namasudra, Suyel
    Kadry, Seifedine
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2020, 16 (03) : 60 - 86
  • [8] A Comprehensive Review on Optimal Location and Sizing of Reactive Power Compensation Using Hybrid-Based Approaches for Power Loss Reduction, Voltage Stability Improvement, Voltage Profile Enhancement and Loadability Enhancement
    Ismail, Bazilah
    Abdul Wahab, Noor Izzri
    Othman, Mohammad Lutfi
    Radzi, Mohd Amran Mohd
    Naidu Vijyakumar, Kanendra
    Mat Naain, Muhammad Najwan
    [J]. IEEE ACCESS, 2020, 8 : 222733 - 222765