Distance-based arranging oversampling technique for imbalanced data

被引:2
|
作者
Dai, Qi [1 ]
Liu, Jian-wei [1 ]
Zhao, Jia-Liang [2 ]
机构
[1] China Univ Petr, Coll Informat Sci & Engn, Dept Automat, 260 Mailbox, Beijing 102249, Peoples R China
[2] North China Univ Sci & Technol, Coll Sci, Tangshan, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Imbalanced data; Oversampling; Resampling; Distance measurement; SMOTE; Classification; OVER-SAMPLING TECHNIQUE; SMOTE; MAJORITY;
D O I
10.1007/s00521-022-07828-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance data sets are common in a vast variety of real-world application areas. Synthetic minority oversampling technique (SMOTE) is an important technique for processing imbalanced data sets. SMOTE requires the user to preset the number of nearest neighbor instances before synthesizing instances, which is often difficult to choose accurately. Moreover, SMOTE is easy to synthesize minority instances in the majority areas, which leads to the performance degradation of the classifier. To address these issues, in this paper, a novel distance-based arranging oversampling (DAO) technique is proposed. DAO can effectively prevent users from selecting inaccurate hyperparameters, and DAO can be used as an alternative algorithm to replace the SMOTE-based oversampling technique. We further filter the synthesized instances by setting appropriate conditions to avoid generating minority instances in the majority domain. In our experiments, we collect 25 public benchmark data sets from the KEEL database and HDDT database, and apply CART and ID3 classification models on the oversampling training set of each data set to assess our DAO technique. Under the two evaluation metrics, F-measure and kappa, compared with the state-of-the-art oversampling techniques, our proposed method is superior or partially superior to them.
引用
收藏
页码:1323 / 1342
页数:20
相关论文
共 50 条
  • [21] KNNOR: An oversampling technique for imbalanced datasets
    Islam, Ashhadul
    Belhaouari, Samir Brahim
    Rehman, Atiq Ur
    Bensmail, Halima
    [J]. APPLIED SOFT COMPUTING, 2022, 115
  • [22] Perturbation-based oversampling technique for imbalanced classification problems
    Zhang, Jianjun
    Wang, Ting
    Ng, Wing W. Y.
    Pedrycz, Witold
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 773 - 787
  • [23] IMWMOTE: A novel oversampling technique for fault diagnosis in heterogeneous imbalanced data
    Wang, Jiaxin
    Wei, Jianan
    Huang, Haisong
    Wen, Long
    Yuan, Yage
    Chen, Hualin
    Wu, Rui
    Wu, Jinxing
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [24] A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Murase, Kazuyuki
    [J]. NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 735 - +
  • [25] Perturbation-based oversampling technique for imbalanced classification problems
    Jianjun Zhang
    Ting Wang
    Wing W. Y. Ng
    Witold Pedrycz
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 773 - 787
  • [26] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [27] Performance of Synthetic Minority Oversampling Technique on Imbalanced Breast Cancer Data
    Rani, K. Usha
    Ramadevi, G. Naga
    Lavanya, D.
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1623 - 1627
  • [28] Global Data Distribution Weighted Synthetic Oversampling Technique for Imbalanced Learning
    Wang, Zhenfei
    Wang, Hongju
    [J]. IEEE ACCESS, 2021, 9 : 44770 - 44783
  • [29] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
    Tao, Yanyun
    Zhang, Yuzhen
    Jiang, Bin
    [J]. BMC MEDICAL GENOMICS, 2020, 13 (Suppl 10)
  • [30] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652