A preprocessing method combined with an ensemble framework for the multiclass imbalanced data classification

被引:0
|
作者
Pavan Kumar M.R. [1 ]
Jayagopal P. [2 ]
机构
[1] School of Computer Science and Engineering, Vellore Institute of Technology, Vellore
[2] School of Information Technology & Engineering, Vellore Institute of Technology, Vellore
关键词
ensemble learning; Imbalanced dataset; multiclass imbalance classification; multiple classifier system; oversampling;
D O I
10.1080/1206212X.2019.1700335
中图分类号
学科分类号
摘要
Skewed distributions appear in many real-world classification problems. Skewed distributions, underrepresented classes, and multiple overlapping regions in multiclass imbalanced datasets deteriorate the performance of existing classification algorithms and approaches. In this context, we combine a novel preprocessing procedure to tackle minority classes in multiclass imbalanced problems with an ensemble framework. The preprocessing method oversamples the minority classes based on normalized probability, and then an ensemble called a stacked generalization framework is used to train the model. The motive behind combining the ensemble framework and the preprocessing procedure is to enhance the overall classification performance of the classifier for multiclass imbalanced problems. Experimental results on 20 multiclass imbalanced datasets show that the proposed preprocessing method with the ensemble framework outperforms the representative approaches in 13 datasets for macro average arithmetic (MAvA) and mean F-measure (MFM) metrics. In the case of state-of-the-art techniques, the proposed approach steered 14 datasets for the MAvA metric and 15 datasets for the MFM metric to success. © 2019 Informa UK Limited, trading as Taylor & Francis Group.
引用
下载
收藏
页码:1178 / 1185
页数:7
相关论文
共 50 条
  • [41] A Method of Imbalanced Traffic Classification Based on Ensemble Learning
    Ding, Yaojun
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2015, : 265 - 268
  • [42] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    Frontiers of Computer Science, 2018, 12 : 331 - 350
  • [43] Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification
    Xu, Yuhong
    Yu, Zhiwen
    Chen, C. L. Philip
    Liu, Zhulin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2284 - 2297
  • [44] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [45] Meta-learning for imbalanced data and classification ensemble in binary classification
    Lin, Sung-Chiang
    Chang, Yuan-chin I.
    Yang, Wei-Ning
    NEUROCOMPUTING, 2009, 73 (1-3) : 484 - 494
  • [46] Preprocessing of Imbalanced Breast Cancer Data using Feature Selection Combined with Over-Sampling Technique for classification
    Jojan, Janjira
    Srivihok, Anongnart
    2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2013, : 407 - 412
  • [47] Blind Multiclass Ensemble Classification
    Traganitis, Panagiotis A.
    Pages-Zamora, Alba
    Giannakis, Georgios B.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (18) : 4737 - 4752
  • [48] A New Adaptive Framework for Classifier Ensemble in Multiclass Large Data
    Parvin, Hamid
    Minaei, Behrouz
    Alizadeh, Hosein
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT I, 2011, 6782 : 526 - 536
  • [49] Rarity updated ensemble with oversampling: An ensemble approach to classification of imbalanced data streams
    Nouri, Zahra
    Kiani, Vahid
    Fadishei, Hamid
    STATISTICAL ANALYSIS AND DATA MINING, 2024, 17 (01)
  • [50] Data Preprocessing for DES-KNN and Its Application to Imbalanced Medical Data Classification
    Kinal, Maciej
    Wozniak, Michal
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 589 - 599