SMOTE: Synthetic minority over-sampling technique

被引:15889
|
作者
Chawla, Nitesh V. [1 ]
Bowyer, Kevin W. [2 ]
Hall, Lawrence O. [1 ]
Kegelmeyer, W. Philip [3 ]
机构
[1] Department of Computer Science and Engineering, ENB 118, University of South Florida, 4202 E. Fowler Ave, Tampa, FL 33620-5399, United States
[2] Department of Computer Science and Engineering, 384 Fitzpatrick Hall, University of Notre Dame, Notre Dame, IN 46556, United States
[3] Sandia National Laboratories, Biosystems Research Department, MS 9951, P.O. Box 969, Livermore, CA, United States
来源
| 2002年 / American Association for Artificial Intelligence卷 / 16期
关键词
Artificial intelligence - Classification (of information) - Error analysis - Performance - Sensitivity analysis;
D O I
10.1613/jair.953
中图分类号
学科分类号
摘要
An approach to the construction of classifiers from unbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of normal examples with only a small percentage of abnormal or interesting examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy. © 2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [41] Enhancing Cascade Quality Prediction Method in Handling Imbalanced Dataset Using Synthetic Minority Over-Sampling Technique
    Julian, Fajar Azhari
    Arif, Fahmi
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2023, 22 (04): : 389 - 398
  • [42] A Back Propagation Neural Network Model with the Synthetic Minority Over-Sampling Technique for Construction Company Bankruptcy Prediction
    Thanh-Long, Ngo
    Tran-Minh
    Hong-Chuong, Le
    INTERNATIONAL JOURNAL OF SUSTAINABLE CONSTRUCTION ENGINEERING AND TECHNOLOGY, 2022, 13 (03): : 68 - 79
  • [43] Applying Synthetic Minority Over-sampling Technique and Support Vector Machine to Develop a Classifier for Parkinson's disease
    Byeon, Haewon
    Kim, Byungsoo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 96 - 101
  • [44] Classification of imbalanced PubChem BioAssay data using an efficient algorithm coupled with synthetic minority over-sampling technique
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [45] Arabic Authorship Attribution Using Synthetic Minority Over-Sampling Technique and Principal Components Analysis for Imbalanced Documents
    Hadjadj, Hassina
    Sayoud, Halim
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [46] A self-adaptive synthetic over-sampling technique for imbalanced classification
    Gu, Xiaowei
    Angelov, Plamen P.
    Soares, Eduardo A.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (06) : 923 - 943
  • [47] Neighborhood Triangular Synthetic Minority Over-sampling Technique for Imbalanced Prediction on Small Samples of Chinese Tourism and Hospitality Firms
    Xu, Yu-Hui
    Li, Hui
    Le, Lu-Ping
    Tian, Xiao-Yun
    2014 SEVENTH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION (CSO), 2014, : 534 - 538
  • [48] Generative adversarial minority enlargement-A local linear over-sampling synthetic method
    Wang, Ke
    Zhou, Tongqing
    Luo, Menghua
    Li, Xionglve
    Cai, Zhiping
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [49] Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms
    Xu, Yuan
    Park, Yongshin
    Park, Ju Dong
    Sun, Bora
    HEALTHCARE, 2023, 11 (24)
  • [50] Scholarship Recipients Prediction Model using k-Nearest Neighbor Algorithm and Synthetic Minority Over-sampling Technique
    Kurniadi, Dede
    Nuraeni, Fitri
    Abania, Nia
    Fitriani, Leni
    Mulyani, Asri
    Agustin, Yoga Handoko
    2022 12TH INTERNATIONAL CONFERENCE ON SYSTEM ENGINEERING AND TECHNOLOGY (ICSET 2022), 2022, : 89 - 94