Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

被引:43
|
作者
García, V. [1 ]
Sánchez, J.S. [1 ]
Martín-Félez, R. [1 ]
Mollineda, R.A. [1 ]
机构
[1] Institute of New Imaging Technologies, Department of Computer Languages and Systems, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071 Castellón de la Plana, Spain
关键词
Imbalance; Over-sampling; Surrounding neighborhood; Nearest centroid neighborhood; Gabriel graph; Relative neighborhood graph; SMOTE;
D O I
10.1007/s13748-012-0027-5
中图分类号
学科分类号
摘要
Many traditional approaches to pattern classification assume that the problem classes share similar prior probabilities. However, in many real-life applications, this assumption is grossly violated. Often, the ratios of prior probabilities between classes are extremely skewed. This situation is known as the class imbalance problem. One of the strategies to tackle this problem consists of balancing the classes by resampling the original data set. The SMOTE algorithm is probably the most popular technique to increase the size of the minority class by generating synthetic instances. From the idea of the original SMOTE, we here propose the use of three approaches to surrounding neighborhood with the aim of generating artificial minority instances, but taking into account both the proximity and the spatial distribution of the examples. Experiments over a large collection of databases and using three different classifiers demonstrate that the new surrounding neighborhood-based SMOTE procedures significantly outperform other existing over-sampling algorithms. © 2012 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:347 / 362
页数:15
相关论文
共 50 条
  • [32] SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    Chawla, Nitesh V.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 863 - 905
  • [33] Distributed Storage Allocations for Neighborhood-based Data Access
    Jakovetic, Dusan
    Minja, Aleksandar
    Bajovic, Dragana
    Vukobratovic, Dejan
    2015 IEEE INFORMATION THEORY WORKSHOP (ITW), 2015,
  • [34] Data poisoning attacks on neighborhood-based recommender systems
    Chen, Liang
    Xu, Yangjun
    Xie, Fenfang
    Huang, Min
    Zheng, Zibin
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2021, 32 (06)
  • [35] Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning From Imbalanced Data
    Pradipta, Gede Angga
    Wardoyo, Retantyo
    Musdholifah, Aina
    Sanjaya, I. Nyoman Hariyasa
    IEEE ACCESS, 2021, 9 : 74763 - 74777
  • [36] SMOTE based class-specific extreme learning machine for imbalanced learning
    Raghuwanshi, Bhagat Singh
    Shukla, Sanyam
    KNOWLEDGE-BASED SYSTEMS, 2020, 187
  • [37] A Supervised Learning Approach for Imbalanced Data Sets
    Nguyen, Giang H.
    Bouzerdoum, Abdesselam
    Phung, Son L.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3759 - 3762
  • [38] RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification
    Arafa, Ahmed
    El-Fishawy, Nawal
    Badawy, Mohammed
    Radad, Marwa
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 5059 - 5074
  • [39] Entropy-based matrix learning machine for imbalanced data sets
    Zhu, Changming
    Wang, Zhe
    PATTERN RECOGNITION LETTERS, 2017, 88 : 72 - 80
  • [40] An Adaptive Sampling Ensemble Classifier for Learning from Imbalanced Data Sets
    Geiler, Ordonez Jon
    Hong, Li
    Yue-Jian, Guo
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 513 - 517