A First Attempt on Global Evolutionary Undersampling for Imbalanced Big Data

被引:0
|
作者
Triguero, I. [1 ]
Galar, M. [3 ]
Bustince, H. [3 ]
Herrera, F. [2 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Nottingham, England
[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, CITIC UGR, E-18071 Granada, Spain
[3] Univ Publ Navarra, Dept Automat & Computat, Campus Arrosadia S-N, Pamplona 31006, Spain
关键词
MAPREDUCE; CLASSIFICATION; INSIGHT;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models. In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance the dataset by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work with very large chromosomes and reduce the costs associated to the fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model.
引用
收藏
页码:2054 / 2061
页数:8
相关论文
共 50 条
  • [21] Threshold optimization and random undersampling for imbalanced credit card data
    Joffrey L. Leevy
    Justin M. Johnson
    John Hancock
    Taghi M. Khoshgoftaar
    Journal of Big Data, 10
  • [22] Threshold optimization and random undersampling for imbalanced credit card data
    Leevy, Joffrey L. L.
    Johnson, Justin M. M.
    Hancock, John
    Khoshgoftaar, Taghi M. M.
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [23] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [24] Undersampling method based on minority class density for imbalanced data
    Sun, Zhongqiang
    Ying, Wenhao
    Zhang, Wenjin
    Gong, Shengrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [25] An approach for classification of highly imbalanced data using weighting and undersampling
    Anand, Ashish
    Pugalenthi, Ganesan
    Fogel, Gary B.
    Suganthan, P. N.
    AMINO ACIDS, 2010, 39 (05) : 1385 - 1391
  • [26] A Membership Probability-Based Undersampling Algorithm for Imbalanced Data
    Ahn, Gilseung
    Park, You-Jin
    Hur, Sun
    JOURNAL OF CLASSIFICATION, 2021, 38 (01) : 2 - 15
  • [27] Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data
    Alamri, Maram
    Ykhlef, Mourad
    IEEE ACCESS, 2024, 12 : 14050 - 14060
  • [28] Using Area Under the Precision Recall Curve to Assess the Effect of Random Undersampling in the Classification of Imbalanced Medicare Big Data
    Hancock III, John T.
    Khoshgoftaar, Taghi M.
    Johnson, Justin M.
    INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2024, 31 (01)
  • [29] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    APPLIED SOFT COMPUTING, 2021, 101
  • [30] Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification
    Krawczyk, Bartosz
    Bellinger, Colin
    Corizzo, Roberto
    Japkowicz, Nathalie
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,