The Effects of Class Imbalance and Training Data Size on Classifier Learning: An Empirical Study

被引:0
|
作者
Zheng W. [1 ]
Jin M. [1 ]
机构
[1] Graduate School of Culture and Information Science, Doshisha University, 1-3 Tatara Miyakodani, Kyoto, Kyotanabe
关键词
Class imbalance; Classifier performance; Hyperparameter tuning; Training data size;
D O I
10.1007/s42979-020-0074-0
中图分类号
学科分类号
摘要
This study discusses the effects of class imbalance and training data size on the predictive performance of classifiers. An empirical study was performed on ten classifiers arising from seven categories, which are frequently employed and have been identified to be efficient. In addition, comprehensive hyperparameter tuning was done for every data to maximize the performance of each classifier. The results indicated that (1) naïve Bayes, logistic regression and logit leaf model are less susceptible to class imbalance while they have relatively poor predictive performance; (2) ensemble classifiers AdaBoost, XGBoost and parRF have a quite poorer stability in terms of class imbalance while they achieved superior predictive accuracies; (3) for all of the classifiers employed in this study, their accuracies decreased as soon as the class imbalance skew reached a certain point 0.10; note that although using datasets with balanced class distribution would be an ideal condition to maximize the performance of classifiers, if the skew is larger than 0.10, a comprehensive hyperparameter tuning may be able to eliminate the effect of class imbalance; (4) no one classifier shows to be robust to the change of training data size; (5) CART is the last choice among the ten classifiers. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [1] Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction : An Empirical Study
    Goel, Lipika
    Sharma, Mayank
    Khatri, Sunil Kumar
    Damodaran, D.
    2018 FIFTH INTERNATIONAL SYMPOSIUM ON INNOVATION IN INFORMATION AND COMMUNICATION TECHNOLOGY (ISIICT 2018), 2018, : 8 - 13
  • [2] An Empirical Comparative Study of Novel Clustering Algorithms for Class Imbalance Learning
    Kumar, Ch. N. Santhosh
    Rao, K. Nageswara
    Govardhan, A.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 2, 2016, 380 : 181 - 191
  • [3] The influence of class imbalance on cost-sensitive learning: An empirical study
    Liu, Xu-Ying
    Zhou, Zhi-Hua
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 970 - +
  • [4] Cross-project defect prediction using data sampling for class imbalance learning: an empirical study
    Goel, Lipika
    Sharma, Mayank
    Khatri, Sunil Kumar
    Damodaran, D.
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, : 130 - 143
  • [5] One-class ensemble classifier for data imbalance problems
    Toshitaka Hayashi
    Hamido Fujita
    Applied Intelligence, 2022, 52 : 17073 - 17089
  • [6] One-class ensemble classifier for data imbalance problems
    Hayashi, Toshitaka
    Fujita, Hamido
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17073 - 17089
  • [7] Combining Sampling and Ensemble Classifier for Multiclass Imbalance Data Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Adnan, Fairuz
    Ahmad, Faudziah
    COMPUTATIONAL SCIENCE AND TECHNOLOGY, ICCST 2017, 2018, 488 : 262 - 272
  • [8] The class imbalance problem in UCS classifier system:: A preliminary study
    Orriols-Puig, Albert
    Bernado-Mansilla, Ester
    LEARNING CLASSIFIER SYSTEMS, 2007, 4399 : 161 - 180
  • [9] Learning from data streams and class imbalance
    Wang, Shuo
    Minku, Leandro L.
    Chawla, Nitesh
    Yao, Xin
    CONNECTION SCIENCE, 2019, 31 (02) : 103 - 104
  • [10] A Streaming Ensemble Classifier with Multi-Class Imbalance Learning for Activity Recognition
    Shahi, Ahmad
    Deng, Jeremiah D.
    Woodford, Brendon J.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3983 - 3990