Distance-based arranging oversampling technique for imbalanced data

被引:2
|
作者
Dai, Qi [1 ]
Liu, Jian-wei [1 ]
Zhao, Jia-Liang [2 ]
机构
[1] China Univ Petr, Coll Informat Sci & Engn, Dept Automat, 260 Mailbox, Beijing 102249, Peoples R China
[2] North China Univ Sci & Technol, Coll Sci, Tangshan, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Imbalanced data; Oversampling; Resampling; Distance measurement; SMOTE; Classification; OVER-SAMPLING TECHNIQUE; SMOTE; MAJORITY;
D O I
10.1007/s00521-022-07828-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance data sets are common in a vast variety of real-world application areas. Synthetic minority oversampling technique (SMOTE) is an important technique for processing imbalanced data sets. SMOTE requires the user to preset the number of nearest neighbor instances before synthesizing instances, which is often difficult to choose accurately. Moreover, SMOTE is easy to synthesize minority instances in the majority areas, which leads to the performance degradation of the classifier. To address these issues, in this paper, a novel distance-based arranging oversampling (DAO) technique is proposed. DAO can effectively prevent users from selecting inaccurate hyperparameters, and DAO can be used as an alternative algorithm to replace the SMOTE-based oversampling technique. We further filter the synthesized instances by setting appropriate conditions to avoid generating minority instances in the majority domain. In our experiments, we collect 25 public benchmark data sets from the KEEL database and HDDT database, and apply CART and ID3 classification models on the oversampling training set of each data set to assess our DAO technique. Under the two evaluation metrics, F-measure and kappa, compared with the state-of-the-art oversampling techniques, our proposed method is superior or partially superior to them.
引用
收藏
页码:1323 / 1342
页数:20
相关论文
共 50 条
  • [31] Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method
    Fu, Chen
    Yang, Jianhua
    [J]. ALGORITHMS, 2021, 14 (02)
  • [32] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
    Yanyun Tao
    Yuzhen Zhang
    Bin Jiang
    [J]. BMC Medical Genomics, 13
  • [33] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [34] An Earth mover's distance-based undersampling approach for handling class-imbalanced data
    Rekha, Gillala
    Krishna Reddy, V.
    Tyagi, Amit Kumar
    [J]. International Journal of Intelligent Information and Database Systems, 2020, 13 (2-4) : 376 - 392
  • [35] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    [J]. NEUROCOMPUTING, 2019, 343 : 19 - 33
  • [36] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [37] Oversampling imbalanced data in the string space
    Castellanos, Francisco J.
    Valero-Mas, Jose J.
    Calvo-Zaragoza, Jorge
    Rico-Juan, Juan R.
    [J]. PATTERN RECOGNITION LETTERS, 2018, 103 : 32 - 38
  • [38] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    [J]. INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [39] Oversampling techniques for imbalanced data in regression
    Belhaouari, Samir Brahim
    Islam, Ashhadul
    Kassoul, Khelil
    Al-Fuqaha, Ala
    Bouzerdoum, Abdesselam
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [40] KNNGAN: an oversampling technique for textual imbalanced datasets
    Mirmorsal Madani
    Homayun Motameni
    Hosein Mohamadi
    [J]. The Journal of Supercomputing, 2023, 79 : 5291 - 5326