Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

被引:0
|
作者
V. García
J. S. Sánchez
R. Martín-Félez
R. A. Mollineda
机构
[1] Universitat Jaume I,Institute of New Imaging Technologies, Department of Computer Languages and Systems
关键词
Imbalance; Over-sampling; Surrounding neighborhood; Nearest centroid neighborhood; Gabriel graph; Relative neighborhood graph; SMOTE;
D O I
10.1007/s13748-012-0027-5
中图分类号
学科分类号
摘要
Many traditional approaches to pattern classification assume that the problem classes share similar prior probabilities. However, in many real-life applications, this assumption is grossly violated. Often, the ratios of prior probabilities between classes are extremely skewed. This situation is known as the class imbalance problem. One of the strategies to tackle this problem consists of balancing the classes by resampling the original data set. The SMOTE algorithm is probably the most popular technique to increase the size of the minority class by generating synthetic instances. From the idea of the original SMOTE, we here propose the use of three approaches to surrounding neighborhood with the aim of generating artificial minority instances, but taking into account both the proximity and the spatial distribution of the examples. Experiments over a large collection of databases and using three different classifiers demonstrate that the new surrounding neighborhood-based SMOTE procedures significantly outperform other existing over-sampling algorithms.
引用
收藏
页码:347 / 362
页数:15
相关论文
共 50 条
  • [1] Surrounding neighborhood-based SMOTE for learning from imbalanced data sets
    Garcia, V.
    Sanchez, J. S.
    Martin-Felez, R.
    Mollineda, R. A.
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, 2012, 1 (04) : 347 - 362
  • [2] Imbalanced Learning Based on Data-Partition and SMOTE
    Guo, Huaping
    Zhou, Jun
    Wu, Chang-An
    [J]. INFORMATION, 2018, 9 (09)
  • [3] FUZZY AND SMOTE RESAMPLING TECHNIQUE FOR IMBALANCED DATA SETS
    Zorkeflee, Maisarah
    Din, Aniza Mohamed
    Ku-Mahamud, Ku Ruhana
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS, 2015, : 638 - 643
  • [4] A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets
    Dong, Yanjie
    Wang, Xuehua
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 343 - 352
  • [5] Feature selection for imbalanced data based on neighborhood rough sets
    Chen, Hongmei
    Li, Tianrui
    Fan, Xin
    Luo, Chuan
    [J]. INFORMATION SCIENCES, 2019, 483 : 1 - 20
  • [6] NMGRS: Neighborhood-based multigranulation rough sets
    Lin, Guoping
    Qian, Yuhua
    Li, Jinjin
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2012, 53 (07) : 1080 - 1093
  • [7] Reduction of Neighborhood-Based Generalized Rough Sets
    Wang, Zhaohao
    Shu, Lan
    Ding, Xiuyong
    [J]. JOURNAL OF APPLIED MATHEMATICS, 2011,
  • [8] Balanced Neighborhood Classifiers for Imbalanced Data Sets
    Zhu, Shunzhi
    Ma, Ying
    Pan, Weiwei
    Zhu, Xiatian
    Luo, Guangchun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (12): : 3226 - 3229
  • [9] Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    Han, H
    Wang, WY
    Mao, BH
    [J]. ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 878 - 887
  • [10] Noise Avoidance SMOTE in Ensemble Learning for Imbalanced Data
    Kim, Kyoungok
    [J]. IEEE ACCESS, 2021, 9 : 143250 - 143265