Cluster-based Majority Under-Sampling Approaches for Class Imbalance Learning

被引:40
|
作者
Zhang, Yan-Ping [1 ]
Zhang, Li-Na [1 ]
Wang, Yong-Cheng [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230039, Peoples R China
关键词
classification; clustering; under-sampling; class imbalance learning;
D O I
10.1109/ICIFE.2010.5609385
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially useful majority class examples. To overcome this drawback, we adopt an unsupervised learning technique for supervised learning. We proposes cluster-based majority under-sampling approaches for selecting a representative subset from the majority class. Compared to under-sampling, cluster-based under-sampling can effectively avoid the important information loss of majority class. We adopt two methods to select representative subset from k clusters with certain proportions, and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes. In the paper, we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers: k-nearest neighbor and Naive Bayes classifier. Recall, Precision, F-measure, G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers. Experimental results show that our cluster-based majority under-sampling approaches outperform the random under-sampling approach. Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naive Bayes classifier.
引用
收藏
页码:400 / 404
页数:5
相关论文
共 50 条
  • [21] Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 731 - 740
  • [22] Manifold cluster-based evolutionary ensemble imbalance learning
    Guo, Yinan
    Feng, Jiawei
    Jiao, Botao
    Yang, Linkai
    Lu, Hui
    Yu, Zekuan
    [J]. Computers and Industrial Engineering, 2021, 159
  • [23] Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem
    Zhou Q.
    Sun B.
    [J]. Data and Information Management, 2024, 8 (03)
  • [24] Manifold cluster-based evolutionary ensemble imbalance learning
    Guo, Yinan
    Feng, Jiawei
    Jiao, Botao
    Yang, Linkai
    Lu, Hui
    Yu, Zekuan
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 159
  • [25] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Goyal, Somya
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 2023 - 2064
  • [26] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436
  • [27] Safe Level Graph for Majority Under-sampling Techniques
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    [J]. CHIANG MAI JOURNAL OF SCIENCE, 2014, 41 (5.2): : 1419 - 1428
  • [28] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Somya Goyal
    [J]. Artificial Intelligence Review, 2022, 55 : 2023 - 2064
  • [29] SOM-US: A Novel Under-Sampling Technique for Handling Class Imbalance Problem
    Kumar, Ajay
    [J]. JOURNAL OF COMMUNICATIONS SOFTWARE AND SYSTEMS, 2024, 20 (01) : 69 - 75
  • [30] Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering
    Gong, Lina
    Jiang, Shujuan
    Jiang, Li
    [J]. IEEE ACCESS, 2019, 7 : 145725 - 145737