SMOTE: Synthetic minority over-sampling technique

被引:15889
|
作者
Chawla, Nitesh V. [1 ]
Bowyer, Kevin W. [2 ]
Hall, Lawrence O. [1 ]
Kegelmeyer, W. Philip [3 ]
机构
[1] Department of Computer Science and Engineering, ENB 118, University of South Florida, 4202 E. Fowler Ave, Tampa, FL 33620-5399, United States
[2] Department of Computer Science and Engineering, 384 Fitzpatrick Hall, University of Notre Dame, Notre Dame, IN 46556, United States
[3] Sandia National Laboratories, Biosystems Research Department, MS 9951, P.O. Box 969, Livermore, CA, United States
来源
| 2002年 / American Association for Artificial Intelligence卷 / 16期
关键词
Artificial intelligence - Classification (of information) - Error analysis - Performance - Sensitivity analysis;
D O I
10.1613/jair.953
中图分类号
学科分类号
摘要
An approach to the construction of classifiers from unbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of normal examples with only a small percentage of abnormal or interesting examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy. © 2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [31] An Improved Intrusion Detection Approach using Synthetic Minority Over-Sampling Technique and Deep Belief Network
    Adil, S. Hasan
    Ali, S. Saad Azhar
    Raza, Kamran
    Hussaan, A. Mahmood
    NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2014, 265 : 94 - 102
  • [32] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,
  • [33] A topological approach for mammographic density classification using a modified synthetic minority over-sampling technique algorithm
    Nedjar, Imane
    Mahmoudi, Said
    Chikh, Mohamed Amine
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2022, 38 (02) : 193 - 214
  • [34] LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
    Munehiro Nakamura
    Yusuke Kajiwara
    Atsushi Otsuka
    Haruhiko Kimura
    BioData Mining, 6
  • [35] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
  • [36] Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique
    Eom, Gayeong
    Byeon, Haewon
    MATHEMATICS, 2023, 11 (16)
  • [37] Precise transformer fault diagnosis via random forest model enhanced by synthetic minority over-sampling technique
    Prasojo, Rahman Azis
    Putra, Muhammad Akmal A.
    Ekojono
    Apriyani, Meyti Eka
    Rahmanto, Anugrah Nur
    Ghoneim, Sherif S. M.
    Mahmoud, Karar
    Lehtonen, Matti
    Darwish, Mohamed M. F.
    ELECTRIC POWER SYSTEMS RESEARCH, 2023, 220
  • [38] Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment
    Liu, Zhen-Tao
    Wu, Bao-Han
    Li, Dan-Yun
    Xiao, Peng
    Mao, Jun-Wei
    SENSORS, 2020, 20 (08)
  • [39] The selection of wart treatment method based on Synthetic Minority Over-sampling Technique and Axiomatic Fuzzy Set theory
    Jia, Wenjuan
    Xia, Hao
    Jia, Lijuan
    Deng, Yingjie
    Liu, Xiaodong
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2020, 40 (01) : 517 - 526
  • [40] Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data
    Feng, Wei
    Dauphin, Gabriel
    Huang, Wenjiang
    Quan, Yinghui
    Bao, Wenxing
    Wu, Mingquan
    Li, Qiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (07) : 2159 - 2169