A First Attempt on Global Evolutionary Undersampling for Imbalanced Big Data

被引:0
|
作者
Triguero, I. [1 ]
Galar, M. [3 ]
Bustince, H. [3 ]
Herrera, F. [2 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Nottingham, England
[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, CITIC UGR, E-18071 Granada, Spain
[3] Univ Publ Navarra, Dept Automat & Computat, Campus Arrosadia S-N, Pamplona 31006, Spain
关键词
MAPREDUCE; CLASSIFICATION; INSIGHT;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models. In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance the dataset by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work with very large chromosomes and reduce the costs associated to the fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model.
引用
收藏
页码:2054 / 2061
页数:8
相关论文
共 50 条
  • [1] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [2] Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark
    Triguero, I.
    Galar, M.
    Merino, D.
    Maillo, J.
    Bustince, H.
    Herrera, F.
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 640 - 647
  • [3] PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data
    Jeon, Yong-Seok
    Lim, Dong-Joon
    IEEE ACCESS, 2020, 8 : 131920 - 131927
  • [4] Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification
    Vairetti, Carla
    Assadi, Jose Luis
    Maldonado, Sebastian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
  • [5] EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling
    Galar, Mikel
    Fernandez, Alberto
    Barrenechea, Edurne
    Herrera, Francisco
    PATTERN RECOGNITION, 2013, 46 (12) : 3460 - 3471
  • [6] A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification
    Le, Hoang Lam
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, I
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [7] Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
    Garcia, Salvador
    Herrera, Francisco
    EVOLUTIONARY COMPUTATION, 2009, 17 (03) : 275 - 306
  • [8] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Luengo, Julian
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    SOFT COMPUTING, 2011, 15 (10) : 1909 - 1936
  • [9] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Julián Luengo
    Alberto Fernández
    Salvador García
    Francisco Herrera
    Soft Computing, 2011, 15 : 1909 - 1936
  • [10] Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
    Krawczyk, Bartosz
    Galar, Mikel
    Jelen, Lukasz
    Herrera, Francisco
    APPLIED SOFT COMPUTING, 2016, 38 : 714 - 726