Collective of Base Classifiers for Mining Imbalanced Data

被引:0
|
作者
Jedrzejowicz, Joanna [1 ]
Jedrzejowicz, Piotr [2 ]
机构
[1] Univ Gdansk, Inst Informat, Fac Math Phys & Informat, PL-80308 Gdansk, Poland
[2] Gdynia Maritime Univ, Dept Informat Syst, PL-81225 Gdynia, Poland
关键词
Imbalanced data; Oversampling; Gene expression programming;
D O I
10.1007/978-3-031-08754-7_62
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Mining imbalanced datasets is a challenging and difficult problem. In this paper we adress it by proposing GEP-NB classifier based on the oversampling technique. It combines two learning methods - Gene Expression Programming and Naive Bayes, which cooperate to produce a final prediction. At the pre-processing stage a simple mechanism for generating synthetic minority class examples and balancing the training set is used. Next, two genes g1 and g2 are evolved using Gene Expression Programming. They differ by applying in each case a different procedure for selecting synthetic minority class examples. If the class prediction by g1 agrees with the class prediction made by g2, their decision is final. Otherwise the final predictive decision is taken by the Naive Bayes classifier. The approach is validated in an extensive computational experiment. Results produced by GEP-NB are compared with performance of several state-of-the-art classifiers. Comparisons show that GEP-NB offers a competitive performance.
引用
收藏
页码:571 / 585
页数:15
相关论文
共 50 条
  • [31] Fuzzy rough classifiers for class imbalanced multi-instance data
    Vluymans, Sarah
    Tarrago, Danel Sanchez
    Saeys, Yvan
    Cornelis, Chris
    Herrera, Francisco
    PATTERN RECOGNITION, 2016, 53 : 36 - 45
  • [32] Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data
    Bardamova, Marina
    Hodashinsky, Ilya
    Konev, Anton
    Shelupanov, Alexander
    SYMMETRY-BASEL, 2019, 11 (12):
  • [33] The effect of imbalanced data class distribution on fuzzy classifiers - Experimental study
    Visa, S
    Ralescu, A
    FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 749 - 754
  • [34] Performance of Machine Learning Classifiers for Malware Detection Over Imbalanced Data
    Morillo, Paulina
    Bahamonde, Diego
    Tapia, Wilian
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, INTELLISYS 2023, 2024, 822 : 496 - 507
  • [35] Data Bases, the Base for Data Mining
    Buchsbaum, Christian
    Hoehler-Schlimm, Sabine
    Rehme, Silke
    DATA MINING IN CRYSTALLOGRAPHY, 2010, 134 : 37 - 58
  • [36] An ensemble classifier framework for mining imbalanced data streams
    Ouyang, Zhen-Zheng
    Luo, Jian-Shu
    Hu, Dong-Min
    Wu, Quan-Yuan
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2010, 38 (01): : 184 - 189
  • [37] GEP-based classifier for mining imbalanced data
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [38] Information granulation based data mining approach for classifying imbalanced data
    Chen, Mu-Chen
    Chen, Long-Sheng
    Hsu, Chun-Chin
    Zeng, Wei-Rong
    INFORMATION SCIENCES, 2008, 178 (16) : 3214 - 3227
  • [39] Optimization of classifiers for data mining based on combinatorial semigroups
    A. V. Kelarev
    J. L. Yearwood
    P. A. Watters
    Semigroup Forum, 2011, 82 : 242 - 251
  • [40] Optimization of classifiers for data mining based on combinatorial semigroups
    Kelarev, A. V.
    Yearwood, J. L.
    Watters, P. A.
    SEMIGROUP FORUM, 2011, 82 (02) : 242 - 251