A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE

被引:25
|
作者
Hussein, Ahmed Saad [1 ,2 ]
Li, Tianrui [1 ]
Yohannese, Chubato Wondaferaw [1 ]
Bashir, Kamal [1 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 611756, Peoples R China
[2] Univ Informat Technol & Commun, Baghdad 00964, Iraq
关键词
Imbalanced datasets; SMOTE; Machine learning; Oversampling; Undersampling; SAMPLING METHOD; CLASSIFICATION; CLASSIFIERS; IDENTIFICATION; SYSTEM; NOISY;
D O I
10.2991/ijcis.d.191114.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalance learning is a challenging task for most standard machine learning algorithms. The Synthetic Minority Oversampling Technique (SMOTE) is a well-known preprocessing approach for handling imbalanced datasets, where the minority class is oversampled by producing synthetic examples in feature vector rather than data space. However, many recent works have shown that the imbalanced ratio in itself is not a problem and deterioration of the model performance is caused by other reasons linked to the minority class sample distribution. The blind oversampling by SMOTE leads to two major problems: noise and borderline examples. Noisy examples are those from one class located in the safe zone of the other. Borderline examples are those located in the neighborhood of the class boundary. These samples are associated with deteriorating performance of the models developed. Therefore, it is critical to concentrate on the minority class data structure and regulate the positioning of the newly introduced minority class samples for better performance of classifiers. Hence, this paper proposes the advanced SMOTE, denoted as A-SMOTE, to adjust the newly introduced minority class examples based on distance to the original minority class samples. To achieve this objective, we first employ the SMOTE algorithm to introduce new samples to the minority and eliminate those examples that are closer to the majority than the minority. We apply the proposed method to 44 datasets at various imbalance ratios. Ten widely used data sampling methods selected from the literature are employed for performance comparison. The C4.5 and Naive Bayes classifiers are utilized for experimental validation. The results confirm the advantage of the proposed method over the other methods in almost all the datasets and illustrate its suitability for data preprocessing in classification tasks. (C) 2019 The Authors. Published by Atlantis Press SARL.
引用
收藏
页码:1412 / 1422
页数:11
相关论文
共 50 条
  • [1] A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
    Ahmed Saad Hussein
    Tianrui Li
    Chubato Wondaferaw Yohannese
    Kamal Bashir
    [J]. International Journal of Computational Intelligence Systems, 2019, 12 : 1412 - 1422
  • [2] PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets
    Chen, Qiong
    Zhang, Zhong-Liang
    Huang, Wen-Po
    Wu, Jian
    Luo, Xing-Gang
    [J]. NEUROCOMPUTING, 2022, 498 : 75 - 88
  • [3] Combination Approach of SMOTE and Biased-SVM for Imbalanced Datasets
    Wang He-Yong
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 228 - 231
  • [4] Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection
    Verbiest, Nele
    Ramentol, Enislay
    Cornelis, Chris
    Herrera, Francisco
    [J]. APPLIED SOFT COMPUTING, 2014, 22 : 511 - 517
  • [5] Geometric SMOTE for imbalanced datasets with nominal and continuous features
    Fonseca, Joao
    Bacao, Fernando
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [6] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    [J]. INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [7] A Modified Borderline Smote with Noise Reduction in Imbalanced Datasets
    Revathi, M.
    Ramyachitra, D.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2021, 121 (03) : 1659 - 1680
  • [8] A Modified Borderline Smote with Noise Reduction in Imbalanced Datasets
    M. Revathi
    D. Ramyachitra
    [J]. Wireless Personal Communications, 2021, 121 : 1659 - 1680
  • [9] Improving the Classification Quality of the SVM Classifier for the Imbalanced Datasets on the Base of Ideas the SMOTE Algorithm
    Demidova, Liliya
    Klyueva, Irina
    [J]. 2017 SEMINAR ON SYSTEMS ANALYSIS, 2017, 10
  • [10] IMPROVING IMBALANCED QUESTION CLASSIFICATION USING STRUCTURED SMOTE BASED APPROACH
    Mohasseb, Alaa
    Bader-El-Den, Mohamed
    Cocea, Mihaela
    Liu, Han
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2018, : 593 - 597