Surrounding neighborhood-based SMOTE for learning from imbalanced data sets

被引:43
|
作者
García, V. [1 ]
Sánchez, J.S. [1 ]
Martín-Félez, R. [1 ]
Mollineda, R.A. [1 ]
机构
[1] Institute of New Imaging Technologies, Department of Computer Languages and Systems, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071 Castellón de la Plana, Spain
关键词
Imbalance; Over-sampling; Surrounding neighborhood; Nearest centroid neighborhood; Gabriel graph; Relative neighborhood graph; SMOTE;
D O I
10.1007/s13748-012-0027-5
中图分类号
学科分类号
摘要
Many traditional approaches to pattern classification assume that the problem classes share similar prior probabilities. However, in many real-life applications, this assumption is grossly violated. Often, the ratios of prior probabilities between classes are extremely skewed. This situation is known as the class imbalance problem. One of the strategies to tackle this problem consists of balancing the classes by resampling the original data set. The SMOTE algorithm is probably the most popular technique to increase the size of the minority class by generating synthetic instances. From the idea of the original SMOTE, we here propose the use of three approaches to surrounding neighborhood with the aim of generating artificial minority instances, but taking into account both the proximity and the spatial distribution of the examples. Experiments over a large collection of databases and using three different classifiers demonstrate that the new surrounding neighborhood-based SMOTE procedures significantly outperform other existing over-sampling algorithms. © 2012 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:347 / 362
页数:15
相关论文
共 50 条
  • [11] DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data
    Dablain, Damien
    Krawczyk, Bartosz
    Chawla, Nitesh, V
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6390 - 6404
  • [12] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Julián Luengo
    Alberto Fernández
    Salvador García
    Francisco Herrera
    Soft Computing, 2011, 15 : 1909 - 1936
  • [13] DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets
    de Carvalho, Alexandre M.
    Prati, Ronaldo C.
    INFORMATION, 2020, 11 (12) : 1 - 22
  • [14] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Luengo, Julian
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    SOFT COMPUTING, 2011, 15 (10) : 1909 - 1936
  • [15] SMOTE-DGC: An Imbalanced Learning Approach of Data Gravitation Based Classification
    Peng, Lizhi
    Zhang, Haibo
    Yang, Bo
    Chen, Yuehui
    Zhou, Xiaoqing
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 133 - 144
  • [16] SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory
    Enislay Ramentol
    Yailé Caballero
    Rafael Bello
    Francisco Herrera
    Knowledge and Information Systems, 2012, 33 : 245 - 265
  • [17] SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory
    Ramentol, Enislay
    Caballero, Yaile
    Bello, Rafael
    Herrera, Francisco
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) : 245 - 265
  • [18] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [19] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
    Liaw, Lawrence Chuin Ming
    Tan, Shing Chiang
    Goh, Pey Yun
    Lim, Chee Peng
    INFORMATION SCIENCES, 2025, 686
  • [20] A LEARNING METHOD FOR IMBALANCED DATA SETS
    de la Calleja, Jorge
    Fuentes, Olac
    Gonzalez, Jesus
    Aceves-Perez, Rita M.
    KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 307 - +