The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study

被引:0
|
作者
Zheng W. [1 ]
Jin M. [1 ]
机构
[1] Graduate School of Culture and Information Science, Doshisha University, 1-3 Tatara Miyakodani, Kyoto, Kyotanabe
关键词
Class imbalance; Classifier performance; Hyperparameter tuning; Training data size;
D O I
10.1007/s42979-020-0074-0
中图分类号
学科分类号
摘要
This study discusses the effects of class imbalance and training data size on the predictive performance of classifiers. An empirical study was performed on ten classifiers arising from seven categories, which are frequently employed and have been identified to be efficient. In addition, comprehensive hyperparameter tuning was done for every data to maximize the performance of each classifier. The results indicated that (1) naïve Bayes, logistic regression and logit leaf model are less susceptible to class imbalance while they have relatively poor predictive performance; (2) ensemble classifiers AdaBoost, XGBoost and parRF have a quite poorer stability in terms of class imbalance while they achieved superior predictive accuracies; (3) for all of the classifiers employed in this study, their accuracies decreased as soon as the class imbalance skew reached a certain point 0.10; note that although using datasets with balanced class distribution would be an ideal condition to maximize the performance of classifiers, if the skew is larger than 0.10, a comprehensive hyperparameter tuning may be able to eliminate the effect of class imbalance; (4) no one classifier shows to be robust to the change of training data size; (5) CART is the last choice among the ten classifiers. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [31] An Experimental Study of the Joint Effects of Class Imbalance and Class Overlap
    Fan, Yutao
    Huang, Heming
    DangZhi, CaiRang
    Ji, XiaWu
    Wu, Qian
    NEXT GENERATION DATA SCIENCE, SDSC 2023, 2024, 2113 : 126 - 140
  • [32] Comparative study on class imbalance learning for credit scoring
    Yao, Ping
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 2, PROCEEDINGS, 2009, : 105 - 107
  • [33] Parameter optimization of kernel-based one-class classifier on imbalance text learning
    Zhuang, Ling
    Dai, Honghua
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 434 - 443
  • [34] Classifier Selection and Ensemble Model for Multi-class Imbalance Learning in Education Grants Prediction
    Sun, Yu
    Li, Zhanli
    Li, Xuewen
    Zhang, Jing
    APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (04) : 290 - 303
  • [35] Imbalanced TSK Fuzzy Classifier by Cross-Class Bayesian Fuzzy Clustering and Imbalance Learning
    Gu, Xiaoqing
    Chung, Fu-Lai
    Ishibuchi, Hisao
    Wang, Shitong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (08): : 2005 - 2020
  • [36] The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study
    Yu, Qiao
    Jiang, Shujuan
    Zhang, Yanmei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (02) : 265 - 272
  • [37] Classifier transfer with data selection strategies for online support vector machine classification with class imbalance
    Krell, Mario Michael
    Wilshusen, Nils
    Seeland, Anett
    Kim, Su Kyoung
    JOURNAL OF NEURAL ENGINEERING, 2017, 14 (02)
  • [38] An Empirical Study for the Multi-class Imbalance Problem with Neural Networks
    Alejo, R.
    Sotoca, J. M.
    Casan, G. A.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 479 - +
  • [39] Dynamic Classifier Selection for Data with Skewed Class Distribution Using Imbalance Ratio and Euclidean Distance
    Zyblewski, Pawel
    Wozniak, Michal
    COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 59 - 73
  • [40] Effects of Distance between Classes and Training Dataset Size on Imbalance Datasets
    Huy, Thach Nguyen
    Foitong, Sombut
    Udomthanapong, Sornchai
    Pinngern, Ouen
    IAENG TRANSACTIONS ON ENGINEERING TECHNOLOGIES VOL 1, 2009, 1089 : 58 - +