A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

被引:0
|
作者
Popel, Mahmudul Hasan [1 ]
Hasib, Khan Md [1 ]
Habib, Syed Ahsan [1 ]
Shah, Faisal Muhammad [1 ]
机构
[1] Ahsanullah Univ Sci & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Class Imbalance; Sampling; Cost-sensitive; Tomek-Links; AdaBoost; RUSBoost; EasyEnsemble;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced learning is the issue of learning from data when the class distribution is highly imbalanced. Class imbalance problems are seen increasingly in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced data (two or more classes) creates additional complexities. Studies suggest that ensemble methods can produce more accurate results than regular Imbalance learning techniques (sampling and cost-sensitive learning). To deal with the problem, we propose a new hybrid under sampling based ensemble approach (HUSBoost) to handle imbalanced data which includes three basic steps- data cleaning, data balancing and classification steps. At first, we remove the noisy data using Tomek-Links. After that we create several balanced subsets by applying random under sampling (RUS) method to the majority class instances. These under sampled majority class instances and the minority class instances constitute the subsets of the imbalanced data-set. Having the same number of majority and minority class instances, they become balanced subsets of data. Then in each balanced subset, random forest (RF), AdaBoost with decision tree (CART) and AdaBoost with Support Vector Machine (SVM) are implemented in parallel where we use soft voting approach to get the combined result. From these ensemble classifiers we get the average result from all the balanced subsets. We also use 27 data-sets with different imbalanced ratio in order to verify the effectiveness of our proposed model and compare the experimental results of our model with RUSBoost and EasyEnsemble method.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Data Mining Techniques in Direct Marketing on Imbalanced Data using Tomek Link Combined with Random Under-sampling
    Yilmaz, Umit
    Gezer, Cengiz
    Aydin, Zafer
    Gungor, V. CaGri
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 67 - 73
  • [42] Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 731 - 740
  • [43] CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
    Rayhan, Farshid
    Ahmed, Sajid
    Mahbub, Asif
    Jani, Md. Rafsan
    Shatabda, Swakkhar
    Farid, Dewan Md.
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 70 - 75
  • [44] Hybrid Sampling Method for Overlap Region of ICS Imbalanced Data
    Gao, Bing
    Gu, Zhaojun
    Zhou, Jingxian
    Sui, He
    [J]. Computer Engineering and Applications, 2023, 59 (19) : 305 - 315
  • [45] A novel two-phase clustering-based under-sampling method for imbalanced classification problems
    Farshidvard, A.
    Hooshmand, F.
    MirHassani, S. A.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [46] A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
    Tayyebe Feizi
    Mohammad Hossein Moattar
    Hamid Tabatabaee
    [J]. Journal of Big Data, 10
  • [47] A Selective Under-Sampling based Bagging SVM for Imbalanced Data Learning in Biomedical Event Trigger Recognition
    Chen, Yifei
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND BIOINFORMATICS (ICBEB 2018), 2018, : 112 - 119
  • [48] Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data
    Luo Ruisen
    Dian Songyi
    Wang Chen
    Cheng Peng
    Tang Zuodong
    Yu YanMei
    Wang Shixiong
    [J]. 3RD INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL AND ROBOTICS ENGINEERING (CACRE 2018), 2018, 428
  • [49] A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
    Feizi, Tayyebe
    Moattar, Mohammad Hossein
    Tabatabaee, Hamid
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [50] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
    Chen, Liping
    Jiang, Jiabao
    Zhang, Yong
    [J]. COMPLEXITY, 2021, 2021