OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:1
|
作者
Li, Junnan [1 ]
Zhu, Qingsheng [1 ]
机构
[1] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
基金
中国国家自然科学基金;
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor; SAMPLING METHOD; SMOTE; NEIGHBOR;
D O I
10.1007/s10489-023-05030-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:31
相关论文
共 50 条
  • [11] ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering
    Ming Guo
    Jia Lu
    The Journal of Supercomputing, 2023, 79 : 8668 - 8698
  • [12] An Ensemble Learning Algorithm Based on Density Peaks Clustering and Fitness for Imbalanced Data
    Xu, Hui
    Liu, Qicheng
    IEEE ACCESS, 2022, 10 : 116120 - 116128
  • [13] A Novel Density Peaks Clustering Algorithm Based on Local Reachability Density
    Hanqing Wang
    Bin Zhou
    Jianyong Zhang
    Ruixue Cheng
    International Journal of Computational Intelligence Systems, 2020, 13 : 690 - 697
  • [14] Adaptive Oversampling via Density Estimation for Online Imbalanced Classification
    Lee, Daeun
    Kim, Hyunjoong
    INFORMATION, 2025, 16 (01)
  • [15] A Novel Density Peaks Clustering Algorithm Based on Local Reachability Density
    Wang, Hanqing
    Zhou, Bin
    Zhang, Jianyong
    Cheng, Ruixue
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 690 - 697
  • [16] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [17] Hierarchical clustering algorithm based on natural local density peaks
    Cai, Fapeng
    Feng, Ji
    Yang, Degang
    Chen, Zhongshang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (11) : 7989 - 8004
  • [18] Density Peaks Clustering Based on Local Minimal Spanning Tree
    Wang, Renmin
    Zhu, Qingsheng
    IEEE ACCESS, 2019, 7 : 108438 - 108446
  • [19] Model-Based Oversampling for Imbalanced Sequence Classification
    Gong, Zhichen
    Chen, Huanhuan
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1009 - 1018
  • [20] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679