Collective of Base Classifiers for Mining Imbalanced Data

被引:0
|
作者
Jedrzejowicz, Joanna [1 ]
Jedrzejowicz, Piotr [2 ]
机构
[1] Univ Gdansk, Inst Informat, Fac Math Phys & Informat, PL-80308 Gdansk, Poland
[2] Gdynia Maritime Univ, Dept Informat Syst, PL-81225 Gdynia, Poland
关键词
Imbalanced data; Oversampling; Gene expression programming;
D O I
10.1007/978-3-031-08754-7_62
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Mining imbalanced datasets is a challenging and difficult problem. In this paper we adress it by proposing GEP-NB classifier based on the oversampling technique. It combines two learning methods - Gene Expression Programming and Naive Bayes, which cooperate to produce a final prediction. At the pre-processing stage a simple mechanism for generating synthetic minority class examples and balancing the training set is used. Next, two genes g1 and g2 are evolved using Gene Expression Programming. They differ by applying in each case a different procedure for selecting synthetic minority class examples. If the class prediction by g1 agrees with the class prediction made by g2, their decision is final. Otherwise the final predictive decision is taken by the Naive Bayes classifier. The approach is validated in an extensive computational experiment. Results produced by GEP-NB are compared with performance of several state-of-the-art classifiers. Comparisons show that GEP-NB offers a competitive performance.
引用
收藏
页码:571 / 585
页数:15
相关论文
共 50 条
  • [21] Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams
    Korycki, Lukasz
    Cano, Alberto
    Krawczyk, Bartosz
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2334 - 2343
  • [22] Large margin classifiers to generate synthetic data for imbalanced datasets
    Marcelo Ladeira Marques
    Saulo Moraes Villela
    Carlos Cristiano Hasenclever Borges
    Applied Intelligence, 2020, 50 : 3678 - 3694
  • [23] Ensemble of Classifiers Based on Multiobjective Genetic Sampling for Imbalanced Data
    Fernandes, Everlandio R. Q.
    de Carvalho, Andre C. P. L. F.
    Yao, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (06) : 1104 - 1115
  • [24] Data mining based fuzzy classification algorithm for imbalanced data
    Xu, Le
    Chow, Mo-Yuen
    Taylor, Leroy S.
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 825 - +
  • [25] A comparison of two approaches to data mining from imbalanced data
    Grzymala-Busse, JW
    Stefanowski, J
    Wilk, S
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 757 - 763
  • [26] Evaluating the scalability of data mining provider classifiers
    Curotto, CL
    Ebecken, NFF
    DATA MINING IV, 2004, 7 : 651 - 660
  • [27] A Comparison of Two Approaches to Data Mining from Imbalanced Data
    Jerzy W. Grzymala-Busse
    Jerzy Stefanowski
    Szymon Wilk
    Journal of Intelligent Manufacturing, 2005, 16 : 565 - 573
  • [28] Combining Different Classifiers in Educational Data Mining
    He Chuan
    Li Ruifan
    Zhong Yixin
    APPLIED INFORMATICS AND COMMUNICATION, PT 5, 2011, 228 : 467 - 473
  • [29] A comparison of two approaches to data mining from imbalanced data
    Grzymala-Busse, JW
    Stefanowski, J
    Wilk, S
    JOURNAL OF INTELLIGENT MANUFACTURING, 2005, 16 (06) : 565 - 573
  • [30] Combining different classifiers in Educational Data Mining
    He Chuan
    Li Ruifan
    Zhong Yixin
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL V, 2010, : 293 - 296