Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

被引:43
|
作者
García, V. [1 ]
Sánchez, J.S. [1 ]
Martín-Félez, R. [1 ]
Mollineda, R.A. [1 ]
机构
[1] Institute of New Imaging Technologies, Department of Computer Languages and Systems, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071 Castellón de la Plana, Spain
关键词
Imbalance; Over-sampling; Surrounding neighborhood; Nearest centroid neighborhood; Gabriel graph; Relative neighborhood graph; SMOTE;
D O I
10.1007/s13748-012-0027-5
中图分类号
学科分类号
摘要
Many traditional approaches to pattern classification assume that the problem classes share similar prior probabilities. However, in many real-life applications, this assumption is grossly violated. Often, the ratios of prior probabilities between classes are extremely skewed. This situation is known as the class imbalance problem. One of the strategies to tackle this problem consists of balancing the classes by resampling the original data set. The SMOTE algorithm is probably the most popular technique to increase the size of the minority class by generating synthetic instances. From the idea of the original SMOTE, we here propose the use of three approaches to surrounding neighborhood with the aim of generating artificial minority instances, but taking into account both the proximity and the spatial distribution of the examples. Experiments over a large collection of databases and using three different classifiers demonstrate that the new surrounding neighborhood-based SMOTE procedures significantly outperform other existing over-sampling algorithms. © 2012 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:347 / 362
页数:15
相关论文
共 50 条
  • [21] Imbalanced data learning using SMOTE and deep learning architecture with optimized features
    Suja A. Alex
    Neural Computing and Applications, 2025, 37 (2) : 967 - 984
  • [22] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Chen, Zhi
    Lin, Tao
    Xia, Xin
    Xu, Hongyan
    Ding, Sha
    APPLIED INTELLIGENCE, 2018, 48 (08) : 2441 - 2457
  • [23] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Zhi Chen
    Tao Lin
    Xin Xia
    Hongyan Xu
    Sha Ding
    Applied Intelligence, 2018, 48 : 2441 - 2457
  • [24] A multiple resampling method for learning from imbalanced data sets
    Estabrooks, A
    Jo, TH
    Japkowicz, N
    COMPUTATIONAL INTELLIGENCE, 2004, 20 (01) : 18 - 36
  • [25] AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data
    Sun, Lin
    Li, Mengmeng
    Ding, Weiping
    Zhang, En
    Mu, Xiaoxia
    Xu, Jiucheng
    INFORMATION SCIENCES, 2022, 612 : 724 - 744
  • [26] Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
    Li, Fengqi
    Yu, Chuang
    Yang, Nanhai
    Xia, Feng
    Li, Guangming
    Kaveh-Yazdy, Fatemeh
    SCIENTIFIC WORLD JOURNAL, 2013,
  • [27] Ensemble classification algorithm based improved SMOTE for imbalanced data
    Ning, Liu, 1600, Natsional'nyi Hirnychyi Universytet
  • [28] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Shujuan Wang
    Yuntao Dai
    Jihong Shen
    Jingxue Xuan
    Scientific Reports, 11
  • [29] Optimization of SMOTE for imbalanced data based on AdaRBFNN and hybrid metaheuristics
    Wang, Zicheng
    Sun, Yanrui
    INTELLIGENT DATA ANALYSIS, 2021, 25 (03) : 541 - 554
  • [30] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Wang, Shujuan
    Dai, Yuntao
    Shen, Jihong
    Xuan, Jingxue
    SCIENTIFIC REPORTS, 2021, 11 (01)