OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:0
|
作者
Junnan Li
Qingsheng Zhu
机构
[1] Chongqing Industry Polytechnic College,School of Artificial Intelligence and Big Data
来源
Applied Intelligence | 2023年 / 53卷
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor;
D O I
暂无
中图分类号
学科分类号
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:30
相关论文
共 50 条
  • [1] OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification
    Li, Junnan
    Zhu, Qingsheng
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30987 - 31017
  • [2] A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
    Cao, Jie
    Shi, Yong
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2021, 28 (06): : 1813 - 1819
  • [3] Natural local density-based adaptive oversampling algorithm for imbalanced classification
    Wang, Wentong
    Yang, Lijun
    Zhang, Jinghui
    Yang, Juntao
    Tang, Dongming
    Liu, Tao
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [4] An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters
    Tong, Wuning
    Wang, Yuping
    Liu, Delong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3419 - 3432
  • [5] Local bias approach to the clustering of discrete density peaks
    Desjacques, Vincent
    PHYSICAL REVIEW D, 2013, 87 (04):
  • [6] CDBH: A clustering and density-based hybrid approach for imbalanced data classification
    Mirzaei, Behzad
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [7] A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION
    Zhang, Xiao
    Paz, Ivan
    Nebot, Angela
    37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023, 2023, : 208 - 212
  • [8] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [9] Clustering based on local density peaks and graph cut
    Long, Zhiguo
    Gao, Yang
    Meng, Hua
    Yao, Yuqin
    Li, Tianrui
    INFORMATION SCIENCES, 2022, 600 : 263 - 286
  • [10] ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering
    Guo, Ming
    Lu, Jia
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (08): : 8668 - 8698