SMOTE: Synthetic minority over-sampling technique

被引:15889
|
作者
Chawla, Nitesh V. [1 ]
Bowyer, Kevin W. [2 ]
Hall, Lawrence O. [1 ]
Kegelmeyer, W. Philip [3 ]
机构
[1] Department of Computer Science and Engineering, ENB 118, University of South Florida, 4202 E. Fowler Ave, Tampa, FL 33620-5399, United States
[2] Department of Computer Science and Engineering, 384 Fitzpatrick Hall, University of Notre Dame, Notre Dame, IN 46556, United States
[3] Sandia National Laboratories, Biosystems Research Department, MS 9951, P.O. Box 969, Livermore, CA, United States
来源
| 2002年 / American Association for Artificial Intelligence卷 / 16期
关键词
Artificial intelligence - Classification (of information) - Error analysis - Performance - Sensitivity analysis;
D O I
10.1613/jair.953
中图分类号
学科分类号
摘要
An approach to the construction of classifiers from unbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of normal examples with only a small percentage of abnormal or interesting examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy. © 2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [1] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [2] FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority class distributions
    Bunkhumpornpat, Chumphol
    Boonchieng, Ekkarat
    Chouvatut, Varin
    Lipsky, David
    Patterns, 2024, 5 (11):
  • [3] LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data
    Nakamura, Munehiro
    Kajiwara, Yusuke
    Otsuka, Atsushi
    Kimura, Haruhiko
    BIODATA MINING, 2013, 6
  • [4] DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    Lursinsap, Chidchanok
    APPLIED INTELLIGENCE, 2012, 36 (03) : 664 - 684
  • [5] DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique
    Chumphol Bunkhumpornpat
    Krung Sinapiromsaran
    Chidchanok Lursinsap
    Applied Intelligence, 2012, 36 : 664 - 684
  • [6] Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    Lursinsap, Chidchanok
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 475 - 482
  • [7] Multi-fidelity model based on synthetic minority over-sampling technique
    Jiuxiang Song
    Jizhong Liu
    Multimedia Tools and Applications, 2024, 83 : 33123 - 33139
  • [8] Imbalanced data classification using improved synthetic minority over-sampling technique
    Anusha, Yamijala
    Visalakshi, R.
    Srinivas, Konda
    MULTIAGENT AND GRID SYSTEMS, 2023, 19 (02) : 117 - 131
  • [9] Classification of Advertisement Text on Facebook Using Synthetic Minority Over-Sampling Technique
    Akkaradamrongrat, Suphamongkol
    Kachamas, Pornpimon
    Sinthupinyo, Sukree
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [10] Multi-fidelity model based on synthetic minority over-sampling technique
    Song, Jiuxiang
    Liu, Jizhong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 33123 - 33139