OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:0
|
作者
Junnan Li
Qingsheng Zhu
机构
[1] Chongqing Industry Polytechnic College,School of Artificial Intelligence and Big Data
来源
Applied Intelligence | 2023年 / 53卷
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor;
D O I
暂无
中图分类号
学科分类号
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:30
相关论文
共 50 条
  • [31] Perturbation-based oversampling technique for imbalanced classification problems
    Zhang, Jianjun
    Wang, Ting
    Ng, Wing W. Y.
    Pedrycz, Witold
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 773 - 787
  • [32] Evidence-based adaptive oversampling algorithm for imbalanced classification
    Lin, Chen-ju
    Leony, Florence
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2209 - 2233
  • [33] Evidence-based adaptive oversampling algorithm for imbalanced classification
    Chen-ju Lin
    Florence Leony
    Knowledge and Information Systems, 2024, 66 : 2209 - 2233
  • [34] A novel oversampling method based on Wasserstein CGAN for imbalanced classification
    Zhou, Hongfang
    Pan, Heng
    Zheng, Kangyun
    Wu, Zongling
    Xiang, Qingyu
    Cybersecurity, 2025, 8 (01)
  • [35] A novel oversampling method based on SeqGAN for imbalanced text classification
    Luo, Yin
    Weng, Xuanlong
    Zheng, Huang
    Feng, Haishan
    Luang, Ke
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2891 - 2894
  • [36] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [37] Density Peaks Clustering Based on Weighted Local Density Sequence and Nearest Neighbor Assignment
    Yu, Donghua
    Liu, Guojun
    Guo, Maozu
    Liu, Xiaoyan
    Yao, Shuang
    IEEE ACCESS, 2019, 7 : 34301 - 34317
  • [38] Perturbation-based oversampling technique for imbalanced classification problems
    Jianjun Zhang
    Ting Wang
    Wing W. Y. Ng
    Witold Pedrycz
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 773 - 787
  • [39] Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification
    Ksieniewicz, Pawel
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 660 - 673
  • [40] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745