A systematic study of the class imbalance problem in convolutional neural networks

被引:1384
|
作者
Buda, Mateusz [1 ,2 ]
Maki, Atsuto [2 ]
Mazurowski, Maciej A. [1 ,3 ]
机构
[1] Duke Univ, Dept Radiol, Sch Med, Durham, NC 27710 USA
[2] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
[3] Duke Univ, Dept Elect & Comp Engn, Durham, NC USA
关键词
Class imbalance; Convolutional neural networks; Deep learning; Image classification; NOVELTY DETECTION APPROACH; CLASSIFICATION; SMOTE; CLASSIFIERS;
D O I
10.1016/j.neunet.2018.07.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest. (c) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:249 / 259
页数:11
相关论文
共 50 条
  • [1] Documenting Evidence of a Reuse of 'A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks'
    Yedida, Rahul
    Menzies, Tim
    [J]. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 1595 - 1595
  • [2] Output Layer Multiplication for Class Imbalance Problem in Convolutional Neural Networks
    Zhao Yang
    Yuanxin Zhu
    Tie Liu
    Sai Zhao
    Yunyan Wang
    Dapeng Tao
    [J]. Neural Processing Letters, 2020, 52 : 2637 - 2653
  • [3] Output Layer Multiplication for Class Imbalance Problem in Convolutional Neural Networks
    Yang, Zhao
    Zhu, Yuanxin
    Liu, Tie
    Zhao, Sai
    Wang, Yunyan
    Tao, Dapeng
    [J]. NEURAL PROCESSING LETTERS, 2020, 52 (03) : 2637 - 2653
  • [4] A systematic study of the class imbalance problem: Automatically identifying empty camera trap images using convolutional neural networks
    Yang, Deng-Qi
    Li, Tao
    Liu, Meng-Tao
    Li, Xiao-Wei
    Chen, Ben-Hui
    [J]. ECOLOGICAL INFORMATICS, 2021, 64
  • [5] An Empirical Study for the Multi-class Imbalance Problem with Neural Networks
    Alejo, R.
    Sotoca, J. M.
    Casan, G. A.
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 479 - +
  • [6] Convolutional neural networks based focal loss for class imbalance problem: a case study of canine red blood cells morphology classification
    Kitsuchart Pasupa
    Supawit Vatathanavaro
    Suchat Tungjitnob
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 15259 - 15275
  • [7] Convolutional neural networks based focal loss for class imbalance problem: a case study of canine red blood cells morphology classification
    Pasupa, Kitsuchart
    Vatathanavaro, Supawit
    Tungjitnob, Suchat
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 14 (11) : 15259 - 15275
  • [8] Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks
    Joloudari, Javad Hassannataj
    Marefat, Abdolreza
    Nematollahi, Mohammad Ali
    Oyelere, Solomon Sunday
    Hussain, Sadiq
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [9] Compensating class imbalance for acoustic chimpanzee detection with convolutional recurrent neural networks
    Anders, Franz
    Kalan, Ammie K.
    Kuehl, Hjalmar S.
    Fuchs, Mirco
    [J]. ECOLOGICAL INFORMATICS, 2021, 65
  • [10] The Performance Index of Convolutional Neural Network-Based Classifiers in Class Imbalance Problem
    Liu, Yanchen
    Lai, King Wai Chiu
    [J]. PATTERN RECOGNITION, 2023, 137