SMOTE: Synthetic minority over-sampling technique

被引:15889
|
作者
Chawla, Nitesh V. [1 ]
Bowyer, Kevin W. [2 ]
Hall, Lawrence O. [1 ]
Kegelmeyer, W. Philip [3 ]
机构
[1] Department of Computer Science and Engineering, ENB 118, University of South Florida, 4202 E. Fowler Ave, Tampa, FL 33620-5399, United States
[2] Department of Computer Science and Engineering, 384 Fitzpatrick Hall, University of Notre Dame, Notre Dame, IN 46556, United States
[3] Sandia National Laboratories, Biosystems Research Department, MS 9951, P.O. Box 969, Livermore, CA, United States
来源
| 2002年 / American Association for Artificial Intelligence卷 / 16期
关键词
Artificial intelligence - Classification (of information) - Error analysis - Performance - Sensitivity analysis;
D O I
10.1613/jair.953
中图分类号
学科分类号
摘要
An approach to the construction of classifiers from unbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of normal examples with only a small percentage of abnormal or interesting examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy. © 2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [21] A clustered borderline synthetic minority over-sampling technique for balancing quick access recorder data
    Li, Kunpeng
    Xu, Junjie
    Zhao, Huimin
    Deng, Wu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (04) : 6849 - 6862
  • [22] CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (01) : 44 - 58
  • [23] An Improved Over-sampling Algorithm based on iForest and SMOTE
    Zheng, Yifeng
    Li, Guohe
    Zhang, Teng
    2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019), 2019, : 75 - 80
  • [24] A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis
    Yang, Fangyuan
    Wang, Kang
    Sun, Lisha
    Zhai, Mengjiao
    Song, Jiejie
    Wang, Hong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [25] Efficiently Predicting Hot Spots in PPIs by Combining Random Forest and Synthetic Minority Over-Sampling Technique
    Zhang, Xiaolong
    Lin, Xiaoli
    Zhao, Jiafu
    Huang, Qianqian
    Xu, Xin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (03) : 774 - 781
  • [26] Synthetic Minority Image Over-sampling Technique: How to Improve AUC for Glioblastoma Patient Survival Prediction
    Liu, Renhao
    Hall, Lawrence O.
    Bowyer, Kevin W.
    Goldgof, Dmitry B.
    Gatenby, Robert
    Ben Ahmed, Kaoutar
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1357 - 1362
  • [27] A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis
    Fangyuan Yang
    Kang Wang
    Lisha Sun
    Mengjiao Zhai
    Jiejie Song
    Hong Wang
    BMC Medical Informatics and Decision Making, 22
  • [28] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [29] Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique
    Pradeepa Sampath
    Gurupriya Elangovan
    Kaaveya Ravichandran
    Vimal Shanmuganathan
    Subbulakshmi Pasupathi
    Tulika Chakrabarti
    Prasun Chakrabarti
    Martin Margala
    Scientific Reports, 14 (1)
  • [30] RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem
    Soltanzadeh, Paria
    Hashemzadeh, Mahdi
    INFORMATION SCIENCES, 2021, 542 : 92 - 111