Data imbalance in classification: Experimental evaluation

被引:400
|
作者
Thabtah, Fadi [1 ]
Hammoud, Suhel [2 ]
Kamalov, Firuz [3 ]
Gonsalves, Amanda [1 ]
机构
[1] Manukau Inst Technol, Corner Manukau Stn Rd,Davies Ave, Auckland 2104, New Zealand
[2] Univ Kalamoon, Deir Atiyah An Nabek Dist Rif Dimashq Governorate, Deir Atiyah, Syria
[3] Canadian Univ Dubai, Sheikh Zayed Rd, Dubai, U Arab Emirates
关键词
Classification; Class imbalance; Data analysis; Machine learning; Statistical analysis; Supervised learning; FEATURES;
D O I
10.1016/j.ins.2019.11.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The advent of Big Data has ushered a new era of scientific breakthroughs. One of the common issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where negatively labeled instances significantly outnumber positively labeled instances. Modern machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our paper is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the corresponding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:429 / 441
页数:13
相关论文
共 50 条
  • [41] Swift Imbalance Data Classification using SMOTE and Extreme Learning Machine
    Rustogi, Rishabh
    Prasad, Ayush
    2019 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS 2019), 2019,
  • [42] Large Imbalance Data Classification Based on MapReduce for Traffic Accident Prediction
    Park, Seoung-hun
    Ha, Young-guk
    2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 45 - 49
  • [43] Development of Evaluation Metrics that Consider Data Imbalance between Classes in Facies Classification (vol 23, pg 131, 2020)
    Pieuchot, M.
    GEOPHYSICS AND GEOPHYSICAL EXPLORATION, 2020, 23 (04): : 267 - 267
  • [44] Neural classification of HEP experimental data
    Vitabile, Salvatore
    Pilato, Giovanni
    Vassallo, Giorgio
    Siniscalchi, S. M.
    Gentile, Antonio
    Sorbello, Filippo
    BIOLOGICAL AND ARTIFICIAL INTELLIGENCE ENVIRONMENTS, 2005, : 149 - 155
  • [45] AN EXPERIMENTAL EVALUATION OF NEURAL NETWORKS FOR CLASSIFICATION
    SUBRAMANIAN, V
    HUNG, MS
    HU, MY
    COMPUTERS & OPERATIONS RESEARCH, 1993, 20 (07) : 769 - 782
  • [46] An Experimental Evaluation of Some Classification Methods
    M. Doumpos
    E. Chatzi
    C. Zopounidis
    Journal of Global Optimization, 2006, 36 : 33 - 50
  • [47] An Experimental Evaluation of Boosting Methods for Classification
    Stollhoff, R.
    Sauerbrei, W.
    Schumacher, M.
    METHODS OF INFORMATION IN MEDICINE, 2010, 49 (03) : 219 - 229
  • [48] Evaluation of Data Imbalance Algorithms on the Prediction of Credit Card Fraud
    Otoo, Godlove
    Appati, Justice Kwame
    Yaokumah, Winfred
    Soli, Michael Agbo Tettey
    Nwolley, Stephane, Jr.
    Ludu, Julius Yaw
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2021, 17 (04)
  • [49] An experimental evaluation of some classification methods
    Doumpos, M.
    Chatzi, E.
    Zopounidis, C.
    JOURNAL OF GLOBAL OPTIMIZATION, 2006, 36 (01) : 33 - 50
  • [50] An Experimental Evaluation of LLM on Image Classification
    Wu, Jiaxuan
    Tang, Xushuo
    Yang, Zhengyi
    Hao, Kongzhang
    Lai, Longbin
    Liu, Yongfei
    DATABASES THEORY AND APPLICATIONS, ADC 2024, 2025, 15449 : 506 - 518