A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

被引:10
|
作者
Liu, Ruijuan [1 ]
机构
[1] Chongqing Jianzhu Coll, Dept Publ Course, Chongqing 400072, Peoples R China
关键词
Class-imbalance learning; Class-imbalance classification; Oversampling; K nearest neighbors; Relative density; BORDERLINE-SMOTE; SAMPLING METHOD; ALGORITHM;
D O I
10.1007/s10489-022-03512-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from class-imbalance data is an important challenge. Among the existing solutions, SMOTE has received great praise and features an extensive range of practical applications. However, SMOTE and its extensions usually degrade due to noise generation and within-class imbalances. Although multiple variations of SMOTE are developed, few of them can solve the above problems at the same time. Besides, many improvements of SMOTE are based on advanced models with introducing external parameters. To solve imbalances between and within classes while overcoming noise generation, a novel synthetic minority oversampling technique based on relative and absolute densities is proposed. First, a novel noise filter based on relative density is proposed to remove noise and smooth class boundary. Second, sparsity and boundary weights are proposed and calculated by relative and absolute densities, respectively. Third, normalized weights based on absolute and sparse weights are proposed to generate more synthetic minority class samples in the class boundary and sparse regions. The main advantages of the proposed algorithm are that: (a) It can effectively avoid noise generation while removing noise and smoothing class the boundary in original data. (b) It generates more synthetic samples in class boundaries and sparse regions; (c) No additional parameters are introduced. Intensive experiments prove that SMOTE-RD outperforms 7 popular oversampling methods in average AUC, average F-measure and average G-mean on real data sets with the acceptable time cost.
引用
收藏
页码:786 / 803
页数:18
相关论文
共 50 条
  • [1] A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification
    Ruijuan Liu
    Applied Intelligence, 2023, 53 : 786 - 803
  • [2] A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Murase, Kazuyuki
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 735 - +
  • [3] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [4] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [5] Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application
    Yi, Huaikuan
    Jiang, Qingchao
    Yan, Xuefeng
    Wang, Bei
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (09) : 5867 - 5875
  • [6] A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data
    Xu, Shoukun
    Li, Zhibang
    Yuan, Baohua
    Yang, Gaochao
    Wang, Xueyuan
    Li, Ning
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 367 - 378
  • [7] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [8] A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique
    Mustafa, Nadir
    Memon, Raheel A.
    Li, Jian-Ping
    Omer, Mohammed Z.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2017, 8 (01) : 61 - 67
  • [9] An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets
    Thejas, G. S.
    Hariprasad, Yashas
    Iyengar, S. S.
    Sunitha, N. R.
    Badrinath, Prajwal
    Chennupati, Shasank
    MACHINE LEARNING WITH APPLICATIONS, 2022, 8
  • [10] An improved and random synthetic minority oversampling technique for imbalanced data
    Wei, Guoliang
    Mu, Weimeng
    Song, Yan
    Dou, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 248