An Undersampling Method Approaching the Ideal Classification Boundary for Imbalance Problems

被引:2
|
作者
Zhou, Wensheng [1 ,2 ]
Liu, Chen [1 ,2 ]
Yuan, Peng [3 ]
Jiang, Lei [3 ]
机构
[1] Natl Key Lab Offshore Oil & Gas Exploitat, Beijing 100028, Peoples R China
[2] CNOOC Res Inst Ltd, Beijing 100028, Peoples R China
[3] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Xiangtan 411201, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 13期
关键词
classification; cluster-based undersampling; imbalanced problem; optimal number of classifiers;
D O I
10.3390/app14135421
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Data imbalance is a common problem in most practical classification applications of machine learning, and it may lead to classification results that are biased towards the majority class if not dealt with properly. An effective means of solving this problem is undersampling in the borderline area; however, it is difficult to find the area that fits the classification boundary. In this paper, we present a novel undersampling framework, whereby the clustering of samples in the majority class is conducted and segmentation is then performed in the boundary area according to the clusters obtained; this enables a better shape that fits the classification boundary to be obtained via the performance of random sampling in the borderline area of these segments. In addition, we hypothesize that there exists an optimal number of classifiers to be integrated into the method of ensemble learning that utilizes multiple classifiers that have been obtained via sampling to promote the algorithm. After passing the hypothesis test, we apply the improved algorithm to the newly developed method. The experimental results show that the proposed method works well.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Undersampling of approaching the classification boundary for imbalance problem
    Jiang, Lei
    Yuan, Peng
    Liao, Jing
    Zhang, Qiongbing
    Liu, Jianxun
    Li, Keqin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (06): : 1
  • [2] UNDERSAMPLING NEAR DECISION BOUNDARY FOR IMBALANCE PROBLEMS
    Zhang, Jianjun
    Wang, Ting
    Ng, Wing W. Y.
    Zhang, Shuai
    Nugent, Chris D.
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 553 - 560
  • [3] Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems
    Ng, Wing W. Y.
    Hu, Junjie
    Yeung, Daniel S.
    Yin, Shaohua
    Roli, Fabio
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (11) : 2402 - 2412
  • [4] SOUL: Scala Oversampling and Undersampling Library for imbalance classification
    Rodriguez, Nestor
    Lopez, David
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    SOFTWAREX, 2021, 15
  • [5] LOCATION BAGGING-BASED UNDERSAMPLING FOR IMBALANCED CLASSIFICATION PROBLEMS
    Rong, Tongwen
    Tian, Xing
    Ng, Wing W. Y.
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION (ICWAPR), 2016, : 72 - 77
  • [6] Anomaly detection-based undersampling for imbalanced classification problems
    Park, You-Jin
    Brito, Paula
    Ma, Yun-Chen
    ENGINEERING OPTIMIZATION, 2024, 56 (12) : 2565 - 2578
  • [7] Class-overlap undersampling based on Schur decomposition for Class-imbalance problems
    Dai, Qi
    Liu, Jian-wei
    Shi, Yong-hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [8] Unbalanced data weighted boundary point integration undersampling method
    He Y.
    Leng X.
    Wan J.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2021, 48 (04): : 176 - 183and191
  • [9] Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems
    Ng, Wing W. Y.
    Xu, Shichao
    Zhang, Jianjun
    Tian, Xing
    Rong, Tongwen
    Kwong, Sam
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1269 - 1279
  • [10] SURVEY METHOD IN APPROACHING LIBRARY PROBLEMS
    TAUBER, MF
    LIBRARY TRENDS, 1964, 13 (01) : 15 - 30