A systematic study of the class imbalance problem in convolutional neural networks

被引:1384
|
作者
Buda, Mateusz [1 ,2 ]
Maki, Atsuto [2 ]
Mazurowski, Maciej A. [1 ,3 ]
机构
[1] Duke Univ, Dept Radiol, Sch Med, Durham, NC 27710 USA
[2] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
[3] Duke Univ, Dept Elect & Comp Engn, Durham, NC USA
关键词
Class imbalance; Convolutional neural networks; Deep learning; Image classification; NOVELTY DETECTION APPROACH; CLASSIFICATION; SMOTE; CLASSIFIERS;
D O I
10.1016/j.neunet.2018.07.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest. (c) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:249 / 259
页数:11
相关论文
共 50 条
  • [21] The class imbalance problem
    Megahed, Fadel M.
    Chen, Ying-Ju
    Megahed, Aly
    Ong, Yuya
    Altman, Naomi
    Krzywinski, Martin
    [J]. NATURE METHODS, 2021, 18 (11) : 1270 - 1272
  • [22] On the Class Imbalance Problem
    Guo, Xinjian
    Yin, Yilong
    Dong, Cailing
    Yang, Gongping
    Zhou, Guangtong
    [J]. ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2008, : 192 - 201
  • [23] The class imbalance problem
    Fadel M. Megahed
    Ying-Ju Chen
    Aly Megahed
    Yuya Ong
    Naomi Altman
    Martin Krzywinski
    [J]. Nature Methods, 2021, 18 : 1270 - 1272
  • [24] Using Generative Adversarial Networks for Handling Class Imbalance Problem
    Aydin, M. Asli
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [25] Do Convolutional Neural Networks Learn Class Hierarchy?
    Alsallakh, Bilal
    Jourabloo, Amin
    Ye, Mao
    Liu, Xiaoming
    Ren, Liu
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 152 - 162
  • [26] Learn class hierarchy using convolutional neural networks
    La Grassa, Riccardo
    Gallo, Ignazio
    Landro, Nicola
    [J]. APPLIED INTELLIGENCE, 2021, 51 (10) : 6622 - 6632
  • [27] Learn class hierarchy using convolutional neural networks
    Riccardo La Grassa
    Ignazio Gallo
    Nicola Landro
    [J]. Applied Intelligence, 2021, 51 : 6622 - 6632
  • [28] A Study on Accelerating Convolutional Neural Networks
    Lin, Hsien-, I
    Cheng, Chung-Sheng
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2019 (ICCMSE-2019), 2019, 2186
  • [29] Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem
    Hartono
    Sitompul, O. S.
    Tulus
    Nababan, E. B.
    [J]. 2ND ANNUAL APPLIED SCIENCE AND ENGINEERING CONFERENCE (AASEC 2017), 2018, 288
  • [30] Analyzing Overfitting Under Class Imbalance in Neural Networks for Image Segmentation
    Li, Zeju
    Kamnitsas, Konstantinos
    Glocker, Ben
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (03) : 1065 - 1077