TLUSBoost algorithm: a boosting solution for class imbalance problem

被引:18
|
作者
Kumar, Sujit [1 ]
Biswas, Saroj Kr. [1 ]
Devi, Debashree [1 ]
机构
[1] NIT Silchar, Dept Comp Sci & Engn, Silchar, Assam, India
关键词
Undersampling; Boosting; Data mining; Class imbalance problem; Tomek-link pair; CLASSIFICATION;
D O I
10.1007/s00500-018-3629-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is habitually assumed that the training sets used for learning are balanced. However, this hypothesis is not always true in real-world applications, and hence, there is a tendency of relying on the classification models that are biased towards the overrepresented class as traditional datamining algorithms are generally inclined towards building of suboptimal classification models. This class imbalance problem is common to many application domains such as data mining, machine learning, pattern recognition, etc. Several techniques have been proposed to alleviate the problem of class imbalance. RUSBoost is one of the ensemble learning approaches that uses random undersampling (RUS) for data resampling and AdaBoost technique for boosting, as a solution to class imbalance. However, RUS may cause the loss of significant information of dataset. Therefore, this paper proposes Tomek-link undersampling-based boosting (TLUSBoost) algorithm which uses Tomek-linked and redundancy-based undersampling (TLRUS) for data resampling and AdaBoost technique for boosting. TLRUS meticulously finds outliers using Tomek-link concept and then eliminates some of the probable redundant instances from the outliers. Hence, this algorithm reduces the loss of information and conserves the characteristics of the dataset, thereby helping the classifier to be trained appropriately. TLUSBoost method is validated with 16 benchmark datasets and compared with EasyEnsemble, BalanceCascade, SMOTEBoost and RUSBoost algorithms. Ten-fold cross-validation is applied to measure overall accuracy and F-measure metric of the models. Experimental results show that the proposed model is better than EasyEnsemble, BalanceCascade, SMOTEBoost and RUSBoost in both overall accuracy and F-measure performance metric.
引用
收藏
页码:10755 / 10767
页数:13
相关论文
共 50 条
  • [21] Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection
    Liu, Lijue
    Wu, Xiaoyu
    Li, Shihao
    Li, Yi
    Tan, Shiyang
    Bai, Yongping
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [22] Handling the Class Imbalance Problem With an Improved Sine Cosine Algorithm for Optimal Instance Selection
    Moorthy, Rajalakshmi Shenbaga
    Selvaraj, Arikumar K.
    Prathiba, Sahaya Beni
    Yenduri, Gokul
    Mohanty, Sachi Nandan
    Ramesh, Janjhyam Venkata Naga
    IEEE ACCESS, 2024, 12 : 87131 - 87151
  • [23] Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection
    Lijue Liu
    Xiaoyu Wu
    Shihao Li
    Yi Li
    Shiyang Tan
    Yongping Bai
    BMC Medical Informatics and Decision Making, 22
  • [24] A generalized boosting algorithm and its application to two-class chemical classification problem
    He, P
    Fang, KT
    Liang, YZ
    Li, BY
    ANALYTICA CHIMICA ACTA, 2005, 543 (1-2) : 181 - 191
  • [25] Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction
    Kim, Myoung-Jong
    Kang, Dae-Ki
    Kim, Hong Bae
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) : 1074 - 1082
  • [26] Handling Class Imbalance Problem in Cultural Modeling
    Su, Peng
    Mao, Wenji
    Zeng, Daniel
    Li, Xiaochen
    Wang, Fei-Yue
    ISI: 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, 2009, : 251 - 256
  • [27] The class imbalance problem in TLC image classification
    Sousa, Antonio V.
    Mendonca, Ana Maria
    Campilho, Aurelio
    IMAGE ANALYSIS AND RECOGNITION, PT 2, 2006, 4142 : 513 - 523
  • [28] Targeting class imbalance problem using GAN
    Bhagwani, Hitesh
    Agarwal, Sonali
    Kodipalli, Ashwini
    Martis, Roshan Joy
    2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, : 318 - 322
  • [29] Alleviating Class Imbalance Problem In Data Mining
    Sarmanova, Akkenzhe
    Albayrak, Songul
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [30] Evolutionary data analysis for the class imbalance problem
    Khoshgoftaar, Taghi M.
    Seliya, Naeem
    Drown, Dennis J.
    INTELLIGENT DATA ANALYSIS, 2010, 14 (01) : 69 - 88