Distance-based arranging oversampling technique for imbalanced data

被引:2
|
作者
Dai, Qi [1 ]
Liu, Jian-wei [1 ]
Zhao, Jia-Liang [2 ]
机构
[1] China Univ Petr, Coll Informat Sci & Engn, Dept Automat, 260 Mailbox, Beijing 102249, Peoples R China
[2] North China Univ Sci & Technol, Coll Sci, Tangshan, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Imbalanced data; Oversampling; Resampling; Distance measurement; SMOTE; Classification; OVER-SAMPLING TECHNIQUE; SMOTE; MAJORITY;
D O I
10.1007/s00521-022-07828-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance data sets are common in a vast variety of real-world application areas. Synthetic minority oversampling technique (SMOTE) is an important technique for processing imbalanced data sets. SMOTE requires the user to preset the number of nearest neighbor instances before synthesizing instances, which is often difficult to choose accurately. Moreover, SMOTE is easy to synthesize minority instances in the majority areas, which leads to the performance degradation of the classifier. To address these issues, in this paper, a novel distance-based arranging oversampling (DAO) technique is proposed. DAO can effectively prevent users from selecting inaccurate hyperparameters, and DAO can be used as an alternative algorithm to replace the SMOTE-based oversampling technique. We further filter the synthesized instances by setting appropriate conditions to avoid generating minority instances in the majority domain. In our experiments, we collect 25 public benchmark data sets from the KEEL database and HDDT database, and apply CART and ID3 classification models on the oversampling training set of each data set to assess our DAO technique. Under the two evaluation metrics, F-measure and kappa, compared with the state-of-the-art oversampling techniques, our proposed method is superior or partially superior to them.
引用
收藏
页码:1323 / 1342
页数:20
相关论文
共 50 条
  • [1] Distance-based arranging oversampling technique for imbalanced data
    Qi Dai
    Jian-wei Liu
    Jia-Liang Zhao
    [J]. Neural Computing and Applications, 2023, 35 : 1323 - 1342
  • [2] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
    Yao, Leehter
    Lin, Tung-Bin
    [J]. SENSORS, 2021, 21 (19)
  • [3] Fuzzy Distance-based Undersampling Technique for Imbalanced Flood Data
    Mahamud, Ku Ruhana Ku
    Zorkeflee, Maisarah
    Din, Aniza Mohamed
    [J]. PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2016, 2016, : 509 - 513
  • [5] Classifying imbalanced data in distance-based feature space
    Shin Ando
    [J]. Knowledge and Information Systems, 2016, 46 : 707 - 730
  • [6] A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique
    Mustafa, Nadir
    Memon, Raheel A.
    Li, Jian-Ping
    Omer, Mohammed Z.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2017, 8 (01) : 61 - 67
  • [7] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
    Ren, Ruonan
    Yang, Youlong
    Sun, Liqin
    [J]. APPLIED INTELLIGENCE, 2020, 50 (08) : 2465 - 2487
  • [8] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
    Ruonan Ren
    Youlong Yang
    Liqin Sun
    [J]. Applied Intelligence, 2020, 50 : 2465 - 2487
  • [9] Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning From Imbalanced Data
    Pradipta, Gede Angga
    Wardoyo, Retantyo
    Musdholifah, Aina
    Sanjaya, I. Nyoman Hariyasa
    [J]. IEEE ACCESS, 2021, 9 : 74763 - 74777
  • [10] Entropy difference and kernel-based oversampling technique for imbalanced data learning
    Wu, Xu
    Yang, Youlong
    Ren, Lingyu
    [J]. INTELLIGENT DATA ANALYSIS, 2020, 24 (06) : 1239 - 1255