Distance-based arranging oversampling technique for imbalanced data

被引：2

作者：

Dai, Qi ^{[1
]}

Liu, Jian-wei ^{[1
]}

Zhao, Jia-Liang ^{[2
]}

机构：

[1] China Univ Petr, Coll Informat Sci & Engn, Dept Automat, 260 Mailbox, Beijing 102249, Peoples R China

[2] North China Univ Sci & Technol, Coll Sci, Tangshan, Peoples R China

来源：

NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期

关键词：

Imbalanced data; Oversampling; Resampling; Distance measurement; SMOTE; Classification; OVER-SAMPLING TECHNIQUE; SMOTE; MAJORITY;

D O I：

10.1007/s00521-022-07828-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Class imbalance data sets are common in a vast variety of real-world application areas. Synthetic minority oversampling technique (SMOTE) is an important technique for processing imbalanced data sets. SMOTE requires the user to preset the number of nearest neighbor instances before synthesizing instances, which is often difficult to choose accurately. Moreover, SMOTE is easy to synthesize minority instances in the majority areas, which leads to the performance degradation of the classifier. To address these issues, in this paper, a novel distance-based arranging oversampling (DAO) technique is proposed. DAO can effectively prevent users from selecting inaccurate hyperparameters, and DAO can be used as an alternative algorithm to replace the SMOTE-based oversampling technique. We further filter the synthesized instances by setting appropriate conditions to avoid generating minority instances in the majority domain. In our experiments, we collect 25 public benchmark data sets from the KEEL database and HDDT database, and apply CART and ID3 classification models on the oversampling training set of each data set to assess our DAO technique. Under the two evaluation metrics, F-measure and kappa, compared with the state-of-the-art oversampling techniques, our proposed method is superior or partially superior to them.

引用

页码：1323 / 1342

页数：20

共 50 条

[31] Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method
Fu, Chen
Yang, Jianhua
[J]. ALGORITHMS, 2021, 14 (02)
[32] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
Yanyun Tao
Yuzhen Zhang
Bin Jiang
[J]. BMC Medical Genomics, 13
[33] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
Xu, Zhaozhao
Shen, Derong
Kou, Yue
Nie, Tiezheng
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
[34] An Earth mover's distance-based undersampling approach for handling class-imbalanced data
Rekha, Gillala
Krishna Reddy, V.
Tyagi, Amit Kumar
[J]. International Journal of Intelligent Information and Database Systems, 2020, 13 (2-4) : 376 - 392
[35] Radial-Based oversampling for noisy imbalanced data classification
Koziarski, Michal
Krawczyk, Bartosz
Wozniak, Michal
[J]. NEUROCOMPUTING, 2019, 343 : 19 - 33
[36] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
Krawczyk, Bartosz
Koziarski, Michal
Wozniak, Michal
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
[37] Oversampling imbalanced data in the string space
Castellanos, Francisco J.
Valero-Mas, Jose J.
Calvo-Zaragoza, Jorge
Rico-Juan, Juan R.
[J]. PATTERN RECOGNITION LETTERS, 2018, 103 : 32 - 38
[38] Adaptive Oversampling for Imbalanced Data Classification
Ertekin, Seyda
[J]. INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
[39] Oversampling techniques for imbalanced data in regression
Belhaouari, Samir Brahim
Islam, Ashhadul
Kassoul, Khelil
Al-Fuqaha, Ala
Bouzerdoum, Abdesselam
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
[40] KNNGAN: an oversampling technique for textual imbalanced datasets
Mirmorsal Madani
Homayun Motameni
Hosein Mohamadi
[J]. The Journal of Supercomputing, 2023, 79 : 5291 - 5326

← 1 2 3 4 5 →