Cluster-based Majority Under-Sampling Approaches for Class Imbalance Learning

被引:40
|
作者
Zhang, Yan-Ping [1 ]
Zhang, Li-Na [1 ]
Wang, Yong-Cheng [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230039, Peoples R China
关键词
classification; clustering; under-sampling; class imbalance learning;
D O I
10.1109/ICIFE.2010.5609385
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially useful majority class examples. To overcome this drawback, we adopt an unsupervised learning technique for supervised learning. We proposes cluster-based majority under-sampling approaches for selecting a representative subset from the majority class. Compared to under-sampling, cluster-based under-sampling can effectively avoid the important information loss of majority class. We adopt two methods to select representative subset from k clusters with certain proportions, and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes. In the paper, we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers: k-nearest neighbor and Naive Bayes classifier. Recall, Precision, F-measure, G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers. Experimental results show that our cluster-based majority under-sampling approaches outperform the random under-sampling approach. Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naive Bayes classifier.
引用
收藏
页码:400 / 404
页数:5
相关论文
共 50 条
  • [1] A majority affiliation based under-sampling method for class imbalance problem
    Xie, Ying
    Huang, Xian
    Qin, Feng
    Li, Fagen
    Ding, Xuyang
    [J]. INFORMATION SCIENCES, 2024, 662
  • [2] Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem
    Sikora, Riyaz
    Raina, Sahil
    [J]. INTELLIGENT COMPUTING, VOL 2, 2019, 857 : 33 - 39
  • [3] Cluster-based under-sampling approaches for imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5718 - 5727
  • [4] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
    Guzman-Ponce, A.
    Valdovinos, R. M.
    Sanchez, J. S.
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
  • [5] Exploratory under-sampling for class-imbalance learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 965 - 969
  • [6] SVM classifier for unbalanced data based on spectrum cluster-based under-sampling approaches
    Tao, Xin-Min
    Zhang, Dong-Xue
    Hao, Si-Yuan
    Fu, Dan-Dan
    [J]. Kongzhi yu Juece/Control and Decision, 2012, 27 (12): : 1761 - 1768
  • [7] Cluster-based Under-sampling with Random Forest for Multi-Class Imbalanced Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Farid, Dewan Md.
    [J]. 2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [8] CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
    Rayhan, Farshid
    Ahmed, Sajid
    Mahbub, Asif
    Jani, Md. Rafsan
    Shatabda, Swakkhar
    Farid, Dewan Md.
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 70 - 75
  • [9] A novel framework for class imbalance learning using intelligent under-sampling
    Naganjaneyulu S.
    Kuppa M.R.
    [J]. Naganjaneyulu, S. (svna2198@gmail.com), 1600, Springer Verlag (02): : 73 - 84
  • [10] Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
    Sikora, Riyaz
    Lee, Yoon Sang
    [J]. INFORMATION SYSTEMS FRONTIERS, 2024,