A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance

被引:213
|
作者
Elreedy, Dina [1 ]
Atiya, Amir F. [1 ]
机构
[1] Cairo Univ, Comp Engn Dept, Giza, Egypt
关键词
Unbalanced data; Minority class; Over-sampling; Data level; SMOTE; DATA SETS; CLASSIFICATION;
D O I
10.1016/j.ins.2019.07.070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced classification problems are often encountered in many applications. The challenge is that there is a minority class that has typically very little data and is often the focus of attention. One approach for handling imbalance is to generate extra data from the minority class, to overcome its shortage of data. The Synthetic Minority over-sampling TEchnique (SMOTE) is one of the dominant methods in the literature that achieves this extra sample generation. It is based on generating examples on the lines connecting a point and one its K-nearest neighbors. This paper presents a theoretical and experimental analysis of the SMOTE method. We explore the accuracy of how faithful it emulates the underlying density. To our knowledge, this is the first mathematical analysis of the SMOTE method. Moreover, we analyze the effect of the different factors on generation accuracy, such as the dimension, size of the training set and the considered number of neighbors K. We also provide a qualitative analysis that examines the factors affecting its accuracy. In addition, we explore the impact of SMOTE on classification boundary, and classification performance. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:32 / 64
页数:33
相关论文
共 50 条
  • [1] Note on "A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance"
    Ferrer, Carlos A.
    Aragon, Efren
    [J]. INFORMATION SCIENCES, 2023, 630 : 322 - 324
  • [2] A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance
    Elreedy, Dina
    Atiya, Amir F.
    [J]. COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 236 - 248
  • [3] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    [J]. MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923
  • [4] Synthetic minority oversampling technique for multiclass imbalance problems
    Zhu, Tuanfei
    Lin, Yaping
    Liu, Yonghe
    [J]. PATTERN RECOGNITION, 2017, 72 : 327 - 340
  • [5] Combining Synthetic Minority Oversampling Technique And Subset Feature Selection Technique For Class Imbalance Problem
    Lachheta, Pawan
    Bawa, Seema
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY & COMPUTING, 2016, 2016,
  • [6] RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique
    Zieba, Maciej
    Tomczak, Jakub M.
    Gonczarek, Adam
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT I, 2015, 9011 : 377 - 386
  • [7] Synthetic oversampling with the majority class: A new perspective on handling extreme imbalance
    Sharma, Shiven
    Bellinger, Colin
    Krawczyk, Bartosz
    Zaiane, Osmar
    Japkowicz, Nathalie
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 447 - 456
  • [8] Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis
    Satriaji, Widi
    Kusumaningrum, Retno
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2018, : 99 - 103
  • [9] SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique
    Li, Yihong
    Wang, Yunpeng
    Li, Tao
    Li, Beibei
    Lan, Xiaolong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [10] CHSMOTE: Convex hull-based synthetic minority oversampling technique for alleviating the class imbalance problem
    Yuan, Xiaohan
    Chen, Shuyu
    Zhou, Han
    Sun, Chuan
    Yuwen, Lu
    [J]. INFORMATION SCIENCES, 2023, 623 : 324 - 341