A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

被引:10
|
作者
Liu, Ruijuan [1 ]
机构
[1] Chongqing Jianzhu Coll, Dept Publ Course, Chongqing 400072, Peoples R China
关键词
Class-imbalance learning; Class-imbalance classification; Oversampling; K nearest neighbors; Relative density; BORDERLINE-SMOTE; SAMPLING METHOD; ALGORITHM;
D O I
10.1007/s10489-022-03512-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from class-imbalance data is an important challenge. Among the existing solutions, SMOTE has received great praise and features an extensive range of practical applications. However, SMOTE and its extensions usually degrade due to noise generation and within-class imbalances. Although multiple variations of SMOTE are developed, few of them can solve the above problems at the same time. Besides, many improvements of SMOTE are based on advanced models with introducing external parameters. To solve imbalances between and within classes while overcoming noise generation, a novel synthetic minority oversampling technique based on relative and absolute densities is proposed. First, a novel noise filter based on relative density is proposed to remove noise and smooth class boundary. Second, sparsity and boundary weights are proposed and calculated by relative and absolute densities, respectively. Third, normalized weights based on absolute and sparse weights are proposed to generate more synthetic minority class samples in the class boundary and sparse regions. The main advantages of the proposed algorithm are that: (a) It can effectively avoid noise generation while removing noise and smoothing class the boundary in original data. (b) It generates more synthetic samples in class boundaries and sparse regions; (c) No additional parameters are introduced. Intensive experiments prove that SMOTE-RD outperforms 7 popular oversampling methods in average AUC, average F-measure and average G-mean on real data sets with the acceptable time cost.
引用
收藏
页码:786 / 803
页数:18
相关论文
共 50 条
  • [21] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757
  • [22] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923
  • [23] Minority oversampling for imbalanced time series classification
    Zhu, Tuanfei
    Luo, Cheng
    Zhang, Zhihong
    Li, Jing
    Ren, Siqi
    Zeng, Yifu
    KNOWLEDGE-BASED SYSTEMS, 2022, 247
  • [24] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] A Synthetic Minority Based on Probabilistic Distribution (SyMProD) Oversampling for Imbalanced Datasets
    Kunakorntum, Intouch
    Hinthong, Woranich
    Phunchongharn, Phond
    IEEE ACCESS, 2020, 8 : 114692 - 114704
  • [26] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
    Song, Xudong
    Chen, Yilin
    Liang, Pan
    Wan, Xiaohui
    Cui, Yunxian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
  • [27] Fuzzy-synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
    Xu, Yanping
    Wu, Chunhua
    Zheng, Kangfeng
    Niu, Xinxin
    Yang, Yixian
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (04):
  • [28] Imbalanced Classification via Feature Dictionary-Based Minority Oversampling
    Park, Minho
    Song, Hwa Jeon
    Kang, Dong-Oh
    IEEE ACCESS, 2022, 10 : 34236 - 34245
  • [29] Probability-Based Synthetic Minority Oversampling Technique
    Altwaijry, Najwa
    IEEE ACCESS, 2023, 11 : 28831 - 28839
  • [30] Optimization of Phishing Website Classification Based on Synthetic Minority Oversampling Technique and Feature Selection
    Prayogo, Rizal Dwi
    Karimah, Siti Amatullah
    2020 5TH INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2020), 2020, : 125 - 130