OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引：0

作者：

Junnan Li

Qingsheng Zhu

机构：

[1] Chongqing Industry Polytechnic College,School of Artificial Intelligence and Big Data

来源：

Applied Intelligence | 2023年 / 53卷

关键词：

Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.

引用

页码：30987 / 31017

页数：30

共 50 条

[31] Perturbation-based oversampling technique for imbalanced classification problems
Zhang, Jianjun
Wang, Ting
Ng, Wing W. Y.
Pedrycz, Witold
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 773 - 787
[32] Evidence-based adaptive oversampling algorithm for imbalanced classification
Lin, Chen-ju
Leony, Florence
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2209 - 2233
[33] Evidence-based adaptive oversampling algorithm for imbalanced classification
Chen-ju Lin
Florence Leony
Knowledge and Information Systems, 2024, 66 : 2209 - 2233
[34] A novel oversampling method based on Wasserstein CGAN for imbalanced classification
Zhou, Hongfang
Pan, Heng
Zheng, Kangyun
Wu, Zongling
Xiang, Qingyu
Cybersecurity, 2025, 8 (01)
[35] A novel oversampling method based on SeqGAN for imbalanced text classification
Luo, Yin
Weng, Xuanlong
Zheng, Huang
Feng, Haishan
Luang, Ke
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2891 - 2894
[36] Radial-Based Approach to Imbalanced Data Oversampling
Koziarski, Michal
Krawczyk, Bartosz
Wozniak, Michal
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
[37] Density Peaks Clustering Based on Weighted Local Density Sequence and Nearest Neighbor Assignment
Yu, Donghua
Liu, Guojun
Guo, Maozu
Liu, Xiaoyan
Yao, Shuang
IEEE ACCESS, 2019, 7 : 34301 - 34317
[38] Perturbation-based oversampling technique for imbalanced classification problems
Jianjun Zhang
Ting Wang
Wing W. Y. Ng
Witold Pedrycz
International Journal of Machine Learning and Cybernetics, 2023, 14 : 773 - 787
[39] Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification
Ksieniewicz, Pawel
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 660 - 673
[40] Imbalanced Data Classification Based on Clustering
Li, Hu
Zou, Peng
Han, Weihong
Xia, Rongze
COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745

← 1 2 3 4 5 →