Distance-based arranging oversampling technique for imbalanced data

被引：2

作者：

Dai, Qi ^{[1
]}

Liu, Jian-wei ^{[1
]}

Zhao, Jia-Liang ^{[2
]}

机构：

[1] China Univ Petr, Coll Informat Sci & Engn, Dept Automat, 260 Mailbox, Beijing 102249, Peoples R China

[2] North China Univ Sci & Technol, Coll Sci, Tangshan, Peoples R China

来源：

NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期

关键词：

Imbalanced data; Oversampling; Resampling; Distance measurement; SMOTE; Classification; OVER-SAMPLING TECHNIQUE; SMOTE; MAJORITY;

D O I：

10.1007/s00521-022-07828-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Class imbalance data sets are common in a vast variety of real-world application areas. Synthetic minority oversampling technique (SMOTE) is an important technique for processing imbalanced data sets. SMOTE requires the user to preset the number of nearest neighbor instances before synthesizing instances, which is often difficult to choose accurately. Moreover, SMOTE is easy to synthesize minority instances in the majority areas, which leads to the performance degradation of the classifier. To address these issues, in this paper, a novel distance-based arranging oversampling (DAO) technique is proposed. DAO can effectively prevent users from selecting inaccurate hyperparameters, and DAO can be used as an alternative algorithm to replace the SMOTE-based oversampling technique. We further filter the synthesized instances by setting appropriate conditions to avoid generating minority instances in the majority domain. In our experiments, we collect 25 public benchmark data sets from the KEEL database and HDDT database, and apply CART and ID3 classification models on the oversampling training set of each data set to assess our DAO technique. Under the two evaluation metrics, F-measure and kappa, compared with the state-of-the-art oversampling techniques, our proposed method is superior or partially superior to them.

引用

页码：1323 / 1342

页数：20

共 50 条

[1] Distance-based arranging oversampling technique for imbalanced data
Qi Dai
Jian-wei Liu
Jia-Liang Zhao
[J]. Neural Computing and Applications, 2023, 35 : 1323 - 1342
[2] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
Yao, Leehter
Lin, Tung-Bin
[J]. SENSORS, 2021, 21 (19)
[3] Fuzzy Distance-based Undersampling Technique for Imbalanced Flood Data
Mahamud, Ku Ruhana Ku
Zorkeflee, Maisarah
Din, Aniza Mohamed
[J]. PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2016, 2016, : 509 - 513
[4] Classifying imbalanced data in distance-based feature space
Ando, Shin
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 46 (03) : 707 - 730
[5] Classifying imbalanced data in distance-based feature space
Shin Ando
[J]. Knowledge and Information Systems, 2016, 46 : 707 - 730
[6] A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique
Mustafa, Nadir
Memon, Raheel A.
Li, Jian-Ping
Omer, Mohammed Z.
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2017, 8 (01) : 61 - 67
[7] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
Ren, Ruonan
Yang, Youlong
Sun, Liqin
[J]. APPLIED INTELLIGENCE, 2020, 50 (08) : 2465 - 2487
[8] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
Ruonan Ren
Youlong Yang
Liqin Sun
[J]. Applied Intelligence, 2020, 50 : 2465 - 2487
[9] Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning From Imbalanced Data
Pradipta, Gede Angga
Wardoyo, Retantyo
Musdholifah, Aina
Sanjaya, I. Nyoman Hariyasa
[J]. IEEE ACCESS, 2021, 9 : 74763 - 74777
[10] Entropy difference and kernel-based oversampling technique for imbalanced data learning
Wu, Xu
Yang, Youlong
Ren, Lingyu
[J]. INTELLIGENT DATA ANALYSIS, 2020, 24 (06) : 1239 - 1255

← 1 2 3 4 5 →