Boosting methods for multi-class imbalanced data classification: an experimental review

被引:158
|
作者
Tanha, Jafar [1 ]
Abdi, Yousef [1 ]
Samadi, Negin [1 ]
Razzaghi, Nazila [1 ]
Asadpour, Mohammad [1 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, POB 51666-16471, Tabriz, Iran
关键词
Boosting algorithms; Imbalanced data; Multi-class classification; Ensemble learning; MULTIPLE CLASSES; DATA-SETS; CLASSIFIERS; PREDICTION; REGRESSION; MODEL;
D O I
10.1186/s40537-020-00349-y
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since canonical machine learning algorithms assume that the dataset has equal number of samples in each class, binary classification became a very challenging task to discriminate the minority class samples efficiently in imbalanced datasets. For this reason, researchers have been paid attention and have proposed many methods to deal with this problem, which can be broadly categorized into data level and algorithm level. Besides, multi-class imbalanced learning is much harder than binary one and is still an open problem. Boosting algorithms are a class of ensemble learning methods in machine learning that improves the performance of separate base learners by combining them into a composite whole. This paper's aim is to review the most significant published boosting techniques on multi-class imbalanced datasets. A thorough empirical comparison is conducted to analyze the performance of binary and multi-class boosting algorithms on various multi-class imbalanced datasets. In addition, based on the obtained results for performance evaluation metrics and a recently proposed criteria for comparing metrics, the selected metrics are compared to determine a suitable performance metric for multi-class imbalanced datasets. The experimental studies show that the CatBoost and LogitBoost algorithms are superior to other boosting algorithms on multi-class imbalanced conventional and big datasets, respectively. Furthermore, the MMCC is a better evaluation metric than the MAUC and G-mean in multi-class imbalanced data domains.
引用
收藏
页数:47
相关论文
共 50 条
  • [1] Boosting methods for multi-class imbalanced data classification: an experimental review
    Jafar Tanha
    Yousef Abdi
    Negin Samadi
    Nazila Razzaghi
    Mohammad Asadpour
    [J]. Journal of Big Data, 7
  • [2] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [3] A survey of multi-class imbalanced data classification methods
    Han, Meng
    Li, Ang
    Gao, Zhihui
    Mu, Dongliang
    Liu, Shujuan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2471 - 2501
  • [4] A review of boosting methods for imbalanced data classification
    Qiujie Li
    Yaobin Mao
    [J]. Pattern Analysis and Applications, 2014, 17 : 679 - 693
  • [5] A review of boosting methods for imbalanced data classification
    Li, Qiujie
    Mao, Yaobin
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 679 - 693
  • [6] Multi-class imbalanced big data classification on Spark
    Sleeman, William C.
    Krawczyk, Bartosz
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [7] A Combination Method for Multi-Class Imbalanced Data Classification
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 365 - 368
  • [8] An Experimental Analysis of Drift Detection Methods on Multi-Class Imbalanced Data Streams
    Palli, Abdul Sattar
    Jaafar, Jafreezal
    Gomes, Heitor Murilo
    Hashmani, Manzoor Ahmed
    Gilal, Abdul Rehman
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [9] Classification of Multi-class Imbalanced Data: Data Difficulty Factors and Selected Methods for Improving Classifiers
    Stefanowski, Jerzy
    [J]. ROUGH SETS (IJCRS 2021), 2021, 12872 : 57 - 72
  • [10] Selecting local ensembles for multi-class imbalanced data classification
    Krawczyk, Bartosz
    Cano, Alberto
    Wozniak, Michal
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,