TLUSBoost algorithm: a boosting solution for class imbalance problem

被引:18
|
作者
Kumar, Sujit [1 ]
Biswas, Saroj Kr. [1 ]
Devi, Debashree [1 ]
机构
[1] NIT Silchar, Dept Comp Sci & Engn, Silchar, Assam, India
关键词
Undersampling; Boosting; Data mining; Class imbalance problem; Tomek-link pair; CLASSIFICATION;
D O I
10.1007/s00500-018-3629-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is habitually assumed that the training sets used for learning are balanced. However, this hypothesis is not always true in real-world applications, and hence, there is a tendency of relying on the classification models that are biased towards the overrepresented class as traditional datamining algorithms are generally inclined towards building of suboptimal classification models. This class imbalance problem is common to many application domains such as data mining, machine learning, pattern recognition, etc. Several techniques have been proposed to alleviate the problem of class imbalance. RUSBoost is one of the ensemble learning approaches that uses random undersampling (RUS) for data resampling and AdaBoost technique for boosting, as a solution to class imbalance. However, RUS may cause the loss of significant information of dataset. Therefore, this paper proposes Tomek-link undersampling-based boosting (TLUSBoost) algorithm which uses Tomek-linked and redundancy-based undersampling (TLRUS) for data resampling and AdaBoost technique for boosting. TLRUS meticulously finds outliers using Tomek-link concept and then eliminates some of the probable redundant instances from the outliers. Hence, this algorithm reduces the loss of information and conserves the characteristics of the dataset, thereby helping the classifier to be trained appropriately. TLUSBoost method is validated with 16 benchmark datasets and compared with EasyEnsemble, BalanceCascade, SMOTEBoost and RUSBoost algorithms. Ten-fold cross-validation is applied to measure overall accuracy and F-measure metric of the models. Experimental results show that the proposed model is better than EasyEnsemble, BalanceCascade, SMOTEBoost and RUSBoost in both overall accuracy and F-measure performance metric.
引用
收藏
页码:10755 / 10767
页数:13
相关论文
共 50 条
  • [1] TLUSBoost algorithm: a boosting solution for class imbalance problem
    Sujit Kumar
    Saroj Kr. Biswas
    Debashree Devi
    Soft Computing, 2019, 23 : 10755 - 10767
  • [2] A Review on Solution to Class Imbalance Problem: Undersampling Approaches
    Devi, Debashree
    Biswas, Saroj K.
    Purkayastha, Biswajit
    2020 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2020), 2020, : 626 - 631
  • [3] Techniques Based Upon Boosting to Counter Class Imbalance Problem-A Survey
    Kaur, Prabhjot
    Negi, Vasu
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2620 - 2623
  • [4] SVM Classification: Optimization with the SMOTE Algorithm for the Class Imbalance Problem
    Demidova, Liliya
    Klyueva, Irina
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 472 - 475
  • [5] Recursive Tube-Partitioning Algorithm for a Class Imbalance Problem
    Kanchanasuk, Suebkul
    Sinapiromsaran, Krung
    THAI JOURNAL OF MATHEMATICS, 2020, 18 (04): : 2041 - 2051
  • [6] An adaptive fuzzy weight algorithm for the class imbalance learning problem
    Quang V.D.
    Khang T.D.
    International Journal of Intelligent Information and Database Systems, 2024, 16 (03) : 221 - 240
  • [7] The class imbalance problem
    Megahed, Fadel M.
    Chen, Ying-Ju
    Megahed, Aly
    Ong, Yuya
    Altman, Naomi
    Krzywinski, Martin
    NATURE METHODS, 2021, 18 (11) : 1270 - 1272
  • [8] On the Class Imbalance Problem
    Guo, Xinjian
    Yin, Yilong
    Dong, Cailing
    Yang, Gongping
    Zhou, Guangtong
    ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2008, : 192 - 201
  • [9] The class imbalance problem
    Fadel M. Megahed
    Ying-Ju Chen
    Aly Megahed
    Yuya Ong
    Naomi Altman
    Martin Krzywinski
    Nature Methods, 2021, 18 : 1270 - 1272
  • [10] Objective Cost-Sensitive-Boosting-WELM for Handling Multi Class Imbalance Problem
    Liu, Zhen
    Tang, Deyu
    Li, Jincheng
    Wang, Ruoyu
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1975 - 1982