The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study

被引:0
|
作者
Zheng W. [1 ]
Jin M. [1 ]
机构
[1] Graduate School of Culture and Information Science, Doshisha University, 1-3 Tatara Miyakodani, Kyoto, Kyotanabe
关键词
Class imbalance; Classifier performance; Hyperparameter tuning; Training data size;
D O I
10.1007/s42979-020-0074-0
中图分类号
学科分类号
摘要
This study discusses the effects of class imbalance and training data size on the predictive performance of classifiers. An empirical study was performed on ten classifiers arising from seven categories, which are frequently employed and have been identified to be efficient. In addition, comprehensive hyperparameter tuning was done for every data to maximize the performance of each classifier. The results indicated that (1) naïve Bayes, logistic regression and logit leaf model are less susceptible to class imbalance while they have relatively poor predictive performance; (2) ensemble classifiers AdaBoost, XGBoost and parRF have a quite poorer stability in terms of class imbalance while they achieved superior predictive accuracies; (3) for all of the classifiers employed in this study, their accuracies decreased as soon as the class imbalance skew reached a certain point 0.10; note that although using datasets with balanced class distribution would be an ideal condition to maximize the performance of classifiers, if the skew is larger than 0.10, a comprehensive hyperparameter tuning may be able to eliminate the effect of class imbalance; (4) no one classifier shows to be robust to the change of training data size; (5) CART is the last choice among the ten classifiers. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [21] Affinity based fuzzy kernel ridge regression classifier for binary class imbalance learning
    Hazarika, Barenya Bikash
    Gupta, Deepak
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [22] Consolidated Tree classifier learning in a car insurance fraud detection domain with class imbalance
    Pérez, JM
    Muguerza, J
    Arbelaitz, O
    Gurrutxaga, I
    Martín, JI
    PATTERN RECOGNITION AND DATA MINING, PT 1, PROCEEDINGS, 2005, 3686 : 381 - 389
  • [23] Parameter optimization of kernel-based one-class classifier on imbalance learning
    Zhuang, Ling
    Dai, Honghua
    Journal of Computers (Finland), 2006, 1 (07): : 32 - 40
  • [24] Effect of class imbalance in heterogeneous network embedding: An empirical study
    Anil, Akash
    Singh, Sanasam Ranbir
    JOURNAL OF INFORMETRICS, 2020, 14 (02)
  • [25] An empirical study on the class imbalance handling techniques for different diseases
    Rhmann W.
    Soft Computing, 2024, 28 (19) : 11439 - 11456
  • [26] Feature Selection and Resampling in Class Imbalance Learning: Which Comes First? An Empirical Study in the Biological Domain
    Zhang, Chongsheng
    Bi, Jingjun
    Soda, Paolo
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 933 - 938
  • [27] An Empirical Analysis of Attribute Skewness over Class Imbalance on Probabilistic Neural Network and NaIve Bayes Classifier
    Shahadat, Nazmul
    Pal, Biprodip
    2015 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION ENGINEERING (ICCIE), 2015, : 150 - 153
  • [28] Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study
    Pandey, Sushant Kumar
    Tripathi, Anil Kumar
    2021 8TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS (ICSCC), 2021, : 58 - 63
  • [29] A learning method for the class imbalance problem with medical data sets
    Li, Der-Chiang
    Liu, Chiao-Wen
    Hu, Susan C.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2010, 40 (05) : 509 - 518
  • [30] Class Imbalance Robust Incremental LPSVM for Data Streams Learning
    Zhu, Lei
    Pang, Shaoning
    Chen, Gang
    Sarrafzadeh, Abdolhossein
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,