EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

被引:19
|
作者
Hoang Lam Le [1 ]
Landa-Silva, Dario [1 ]
Galar, Mikel [3 ]
Garcia, Salvador [2 ]
Triguero, Isaac [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Computat Optimisat & Learning COL Lab, Nottingham NG8 1BB, England
[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada 18071, Spain
[3] Univ Publ Navarra, Dept Automat & Computat, Campus Arrosadia S-N, Pamplona 31006, Spain
关键词
Data preprocessing; Evolutionary undersampling; Surrogate models; Imbalanced classification; Fitness approximation; MIXED-INTEGER; STRATEGIES; SELECTION; SMOTE;
D O I
10.1016/j.asoc.2020.107033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance. Crown Copyright (C) 2020 Published by Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A semi-supervised clustering-based classification model for classifying imbalanced data streams in the presence of scarcely labelled data
    Bhowmick, Kiran
    Narvekar, Meera
    [J]. International Journal of Business Intelligence and Data Mining, 2022, 20 (02) : 170 - 191
  • [22] Novel fuzzy clustering-based undersampling framework for class imbalance problem
    Vibha Pratap
    Amit Prakash Singh
    [J]. International Journal of System Assurance Engineering and Management, 2023, 14 : 967 - 976
  • [23] Novel fuzzy clustering-based undersampling framework for class imbalance problem
    Pratap, Vibha
    Singh, Amit Prakash
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (03) : 967 - 976
  • [24] Feature Selection and Overlapping Clustering-Based Multilabel Classification Model
    Peng, Liwen
    Liu, Yongguo
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [25] A clustering-based surrogate model updating approach to simulation-based engineering design
    Shao, Tiefu
    Krishnamurty, Sundar
    [J]. JOURNAL OF MECHANICAL DESIGN, 2008, 130 (04)
  • [26] Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach
    Sanchez-Hernandez, Fernando
    Carlos Ballesteros-Herraez, Juan
    Kraiem, Mohamed S.
    Sanchez-Barba, Mercedes
    Moreno-Garcia, Maria N.
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (24):
  • [27] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    [J]. COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [28] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [29] Subclass-based Undersampling for Class-imbalanced Image Classification
    Lehmann, Daniel
    Ebner, Marc
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 493 - 500
  • [30] EACImpute: An Evolutionary Algorithm for Clustering-Based Imputation
    Silva, Jonathan de Andrade
    Hruschka, Eduardo R.
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 1400 - 1406