SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory

被引:0
|
作者
Enislay Ramentol
Yailé Caballero
Rafael Bello
Francisco Herrera
机构
[1] University of Camagüey,Department of Computer Science
[2] Universidad Central de Las Villas,Department of Computer Science
[3] University of Granada,Department of Computer Science and Artificial Intelligence, CITIC
来源
关键词
Imbalanced data-sets; Classification; Data preparation; Oversampling; Undersampling; Rough sets theory;
D O I
暂无
中图分类号
学科分类号
摘要
Imbalanced data is a common problem in classification. This phenomenon is growing in importance since it appears in most real domains. It has special relevance to highly imbalanced data-sets (when the ratio between classes is high). Many techniques have been developed to tackle the problem of imbalanced training sets in supervised learning. Such techniques have been divided into two large groups: those at the algorithm level and those at the data level. Data level groups that have been emphasized are those that try to balance the training sets by reducing the larger class through the elimination of samples or increasing the smaller one by constructing new samples, known as undersampling and oversampling, respectively. This paper proposes a new hybrid method for preprocessing imbalanced data-sets through the construction of new samples, using the Synthetic Minority Oversampling Technique together with the application of an editing technique based on the Rough Set Theory and the lower approximation of a subset. The proposed method has been validated by an experimental study showing good results using C4.5 as the learning algorithm.
引用
收藏
页码:245 / 265
页数:20
相关论文
共 27 条
  • [1] SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory
    Ramentol, Enislay
    Caballero, Yaile
    Bello, Rafael
    Herrera, Francisco
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) : 245 - 265
  • [2] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Julián Luengo
    Alberto Fernández
    Salvador García
    Francisco Herrera
    [J]. Soft Computing, 2011, 15 : 1909 - 1936
  • [3] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Luengo, Julian
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    [J]. SOFT COMPUTING, 2011, 15 (10) : 1909 - 1936
  • [4] DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets
    de Carvalho, Alexandre M.
    Prati, Ronaldo C.
    [J]. INFORMATION, 2020, 11 (12) : 1 - 22
  • [5] Imbalanced Data Classification: A Novel Re-sampling Approach Combining Versatile Improved SMOTE and Rough Sets
    Borowska, Katarzyna
    Stepaniuk, Jaroslaw
    [J]. COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2016, 2016, 9842 : 31 - 42
  • [6] Surrounding neighborhood-based SMOTE for learning from imbalanced data sets
    V. García
    J. S. Sánchez
    R. Martín-Félez
    R. A. Mollineda
    [J]. Progress in Artificial Intelligence, 2012, 1 (4) : 347 - 362
  • [7] Surrounding neighborhood-based SMOTE for learning from imbalanced data sets
    Garcia, V.
    Sanchez, J. S.
    Martin-Felez, R.
    Mollineda, R. A.
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, 2012, 1 (04) : 347 - 362
  • [8] A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets
    Dong, Yanjie
    Wang, Xuehua
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 343 - 352
  • [9] SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
    Agrawal, Astha
    Viktor, Herna L.
    Paquet, Eric
    [J]. 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 226 - 233
  • [10] SIA-SMOTE: A SMOTE-Based Oversampling Method with Better Interpolation on High-Dimensional Data by Using a Siamese Network
    Heroza, Rahmat Izwan
    Gan, John Q.
    Raza, Haider
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT I, 2023, 14134 : 448 - 460