The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study

被引:0
|
作者
Zheng W. [1 ]
Jin M. [1 ]
机构
[1] Graduate School of Culture and Information Science, Doshisha University, 1-3 Tatara Miyakodani, Kyoto, Kyotanabe
关键词
Class imbalance; Classifier performance; Hyperparameter tuning; Training data size;
D O I
10.1007/s42979-020-0074-0
中图分类号
学科分类号
摘要
This study discusses the effects of class imbalance and training data size on the predictive performance of classifiers. An empirical study was performed on ten classifiers arising from seven categories, which are frequently employed and have been identified to be efficient. In addition, comprehensive hyperparameter tuning was done for every data to maximize the performance of each classifier. The results indicated that (1) naïve Bayes, logistic regression and logit leaf model are less susceptible to class imbalance while they have relatively poor predictive performance; (2) ensemble classifiers AdaBoost, XGBoost and parRF have a quite poorer stability in terms of class imbalance while they achieved superior predictive accuracies; (3) for all of the classifiers employed in this study, their accuracies decreased as soon as the class imbalance skew reached a certain point 0.10; note that although using datasets with balanced class distribution would be an ideal condition to maximize the performance of classifiers, if the skew is larger than 0.10, a comprehensive hyperparameter tuning may be able to eliminate the effect of class imbalance; (4) no one classifier shows to be robust to the change of training data size; (5) CART is the last choice among the ten classifiers. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [41] The Effects of Random Undersampling with Simulated Class Imbalance for Big Data
    Hasanin, Tawfiq
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 70 - 79
  • [42] Properties of a GP Active Learning Framework for Streaming Data with Class Imbalance
    Khanchi, Sara
    Heywood, Malcolm I.
    Zincir-Heywood, A. Nur
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 945 - 952
  • [43] A Systematic Study of Online Class Imbalance Learning With Concept Drift
    Wang, Shuo
    Minku, Leandro L.
    Yao, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4802 - 4821
  • [44] A comparative study on rough set based class imbalance learning
    Liu, Jinfu
    Hu, Qinghua
    Yu, Daren
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 753 - 763
  • [45] Effects of Training Directionality and Class Size on Equivalence Class Formation by Adults
    Lanny Fields
    Sharon A. Hobbie-Reeve
    Barbara J. Adams
    Kenneth F. Reeve
    The Psychological Record, 1999, 49 : 703 - 723
  • [46] Effects of training directionality and class size on equivalence class formation by adults
    Fields, L
    Hobbie-Reeve, SA
    Adams, BJ
    Reeve, KF
    PSYCHOLOGICAL RECORD, 1999, 49 (04): : 703 - 723
  • [47] Dictionary Learning Based Nonlinear Classifier Training from Distributed Data
    Shakeri, Zahra
    Raja, Haroon
    Bajwa, Waheed U.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 759 - 763
  • [48] Learning of a Robusted Nearest Neighbor Classifier Using Multiple Training Data
    Malach, Tobias
    Pomenkova, Jitka
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING, (IWSSIP 2016), 2016, : 47 - 50
  • [49] A Novel Linear Classifier for Class Imbalance Data Arising in Failure-Prone Air Pressure Systems
    Syed, Mujahid N.
    Hassan, Md. Rafiul
    Ahmad, Irfan
    Hassan, Mohammad Mehedi
    De Albuquerque, Victor Hugo C.
    IEEE ACCESS, 2021, 9 : 4211 - 4222
  • [50] An Empirical Study for Class Imbalance in Extreme Multi-label Text Classification
    Han, Sangwoo
    Lim, Chan
    Cha, Bonggeon
    Lee, Jongwuk
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 338 - 341