EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

被引:19
|
作者
Hoang Lam Le [1 ]
Landa-Silva, Dario [1 ]
Galar, Mikel [3 ]
Garcia, Salvador [2 ]
Triguero, Isaac [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Computat Optimisat & Learning COL Lab, Nottingham NG8 1BB, England
[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada 18071, Spain
[3] Univ Publ Navarra, Dept Automat & Computat, Campus Arrosadia S-N, Pamplona 31006, Spain
关键词
Data preprocessing; Evolutionary undersampling; Surrogate models; Imbalanced classification; Fitness approximation; MIXED-INTEGER; STRATEGIES; SELECTION; SMOTE;
D O I
10.1016/j.asoc.2020.107033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance. Crown Copyright (C) 2020 Published by Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification
    Le, Hoang Lam
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, I
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [2] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    [J]. INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [3] Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
    Onan, Aytug
    [J]. SCIENTIFIC PROGRAMMING, 2019, 2019
  • [4] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    [J]. 2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [5] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Haichao
    Wang, Jia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 292
  • [6] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Hachao
    Wang, Jia
    [J]. Knowledge-Based Systems, 2024, 292
  • [7] Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis
    Zhang, Jue
    Chen, Li
    [J]. COMPUTER ASSISTED SURGERY, 2019, 24 : 62 - 72
  • [8] Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
    Garcia, Salvador
    Herrera, Francisco
    [J]. EVOLUTIONARY COMPUTATION, 2009, 17 (03) : 275 - 306
  • [9] A clustering-based adaptive undersampling ensemble method for highly unbalanced data classification
    Yuan, Xiaohan
    Sun, Chuan
    Chen, Shuyu
    [J]. APPLIED SOFT COMPUTING, 2024, 159
  • [10] Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
    Krawczyk, Bartosz
    Galar, Mikel
    Jelen, Lukasz
    Herrera, Francisco
    [J]. APPLIED SOFT COMPUTING, 2016, 38 : 714 - 726