A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

被引:0
|
作者
Popel, Mahmudul Hasan [1 ]
Hasib, Khan Md [1 ]
Habib, Syed Ahsan [1 ]
Shah, Faisal Muhammad [1 ]
机构
[1] Ahsanullah Univ Sci & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Class Imbalance; Sampling; Cost-sensitive; Tomek-Links; AdaBoost; RUSBoost; EasyEnsemble;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced learning is the issue of learning from data when the class distribution is highly imbalanced. Class imbalance problems are seen increasingly in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced data (two or more classes) creates additional complexities. Studies suggest that ensemble methods can produce more accurate results than regular Imbalance learning techniques (sampling and cost-sensitive learning). To deal with the problem, we propose a new hybrid under sampling based ensemble approach (HUSBoost) to handle imbalanced data which includes three basic steps- data cleaning, data balancing and classification steps. At first, we remove the noisy data using Tomek-Links. After that we create several balanced subsets by applying random under sampling (RUS) method to the majority class instances. These under sampled majority class instances and the minority class instances constitute the subsets of the imbalanced data-set. Having the same number of majority and minority class instances, they become balanced subsets of data. Then in each balanced subset, random forest (RF), AdaBoost with decision tree (CART) and AdaBoost with Support Vector Machine (SVM) are implemented in parallel where we use soft voting approach to get the combined result. From these ensemble classifiers we get the average result from all the balanced subsets. We also use 27 data-sets with different imbalanced ratio in order to verify the effectiveness of our proposed model and compare the experimental results of our model with RUSBoost and EasyEnsemble method.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    [J]. 2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [2] An Active Under-sampling Approach for Imbalanced Data Classification
    Yang, Zeping
    Gao, Daqi
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 2, 2012, : 270 - 273
  • [3] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [4] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    [J]. Frontiers of Computer Science, 2018, 12 : 331 - 350
  • [5] A New Hybrid Under-sampling Approach to Imbalanced Classification Problems
    Peng, Chun-Yang
    Park, You-Jin
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [6] An Under-sampling Imbalanced Learning of Data Gravitation Based Classification
    Peng, Lizhi
    Yang, Bo
    Chen, Yuehui
    Zhou, Xiaoqing
    [J]. 2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 419 - 425
  • [7] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    [J]. 2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [8] EVOLUTIONARY-BASED ENSEMBLE UNDER-SAMPLING FOR IMBALANCED DATA
    Zhang, Yongqing
    Lu, Rongzhao
    Huang, Ji
    Gao, Dongrui
    [J]. 2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 212 - 216
  • [9] A design of information granule-based under-sampling method in imbalanced data classification
    Tianyu Liu
    Xiubin Zhu
    Witold Pedrycz
    Zhiwu Li
    [J]. Soft Computing, 2020, 24 : 17333 - 17347
  • [10] A Meta-Learning Method to Select Under-Sampling Algorithms for Imbalanced Data Sets
    de Morais, Romero F. A. B.
    Miranda, Pericles B. C.
    Silva, Ricardo M. A.
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 385 - 390