An Undersampling Method Approaching the Ideal Classification Boundary for Imbalance Problems

被引:2
|
作者
Zhou, Wensheng [1 ,2 ]
Liu, Chen [1 ,2 ]
Yuan, Peng [3 ]
Jiang, Lei [3 ]
机构
[1] Natl Key Lab Offshore Oil & Gas Exploitat, Beijing 100028, Peoples R China
[2] CNOOC Res Inst Ltd, Beijing 100028, Peoples R China
[3] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Xiangtan 411201, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 13期
关键词
classification; cluster-based undersampling; imbalanced problem; optimal number of classifiers;
D O I
10.3390/app14135421
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Data imbalance is a common problem in most practical classification applications of machine learning, and it may lead to classification results that are biased towards the majority class if not dealt with properly. An effective means of solving this problem is undersampling in the borderline area; however, it is difficult to find the area that fits the classification boundary. In this paper, we present a novel undersampling framework, whereby the clustering of samples in the majority class is conducted and segmentation is then performed in the boundary area according to the clusters obtained; this enables a better shape that fits the classification boundary to be obtained via the performance of random sampling in the borderline area of these segments. In addition, we hypothesize that there exists an optimal number of classifiers to be integrated into the method of ensemble learning that utilizes multiple classifiers that have been obtained via sampling to promote the algorithm. After passing the hypothesis test, we apply the improved algorithm to the newly developed method. The experimental results show that the proposed method works well.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] A Method Approaching Mirror Boundary Condition in SPH Simulation
    Jin, Feng
    Wan, Chao
    Liu, Huying
    ELECTRICAL POWER & ENERGY SYSTEMS, PTS 1 AND 2, 2012, 516-517 : 1043 - +
  • [22] Imbalance accuracy metric for model selection in multi-class imbalance classification problems
    Mortaz, Ebrahim
    KNOWLEDGE-BASED SYSTEMS, 2020, 210
  • [23] Boundary Oversampling Based Graph Node Imbalance Classification Algorithm
    Wu, Tianhao
    Dong, Minggang
    Tan, Ruoqi
    Computer Engineering and Applications, 2024, 60 (13) : 92 - 101
  • [24] WEIGHTED ENSEMBLE OF DIVERSIFIED SENSITIVITY-BASED UNDERSAMPLING FOR IMBALANCED PATTERN CLASSIFICATION PROBLEMS
    Chai, Yulin
    Zhang, Jianjun
    Ng, Wing W. Y.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2017, : 42 - 47
  • [25] On vacuum free boundary problems in ideal compressible magnetohydrodynamics
    Secchi, Paolo
    Trakhinin, Yuri
    Wang, Tao
    BULLETIN OF THE LONDON MATHEMATICAL SOCIETY, 2023, 55 (05) : 2087 - 2111
  • [26] On the free boundary problems for the ideal incompressible MHD equations
    Liu, Sicheng
    Xin, Zhouping
    CALCULUS OF VARIATIONS AND PARTIAL DIFFERENTIAL EQUATIONS, 2025, 64 (03)
  • [27] The method of characteristics in ideal plasticity problems
    Annin, B. D.
    Klunnikova, M. M.
    Sadovskaya, O. V.
    Sadovskii, V. M.
    PMM JOURNAL OF APPLIED MATHEMATICS AND MECHANICS, 2012, 76 (05): : 497 - 505
  • [28] MTSbag: A Method to Solve Class Imbalance Problems
    Hsiao, Yu-Hsiang
    Su, Chao-Ton
    Fu, Pin-Cheng
    Chen, Mu-Chen
    2018 7TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2018), 2018, : 524 - 529
  • [29] Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem
    Ofek, Nir
    Rokach, Lior
    Stern, Roni
    Shabtai, Asaf
    NEUROCOMPUTING, 2017, 243 : 88 - 102
  • [30] On the boundary problems in diagnostic classification models
    Yamaguchi K.
    Behaviormetrika, 2023, 50 (1) : 399 - 429