A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets

被引:4
|
作者
Zhang, Jinwei [1 ]
Lu, Huijuan [1 ]
Chen, Wutao [1 ]
Lu, Yi [2 ]
机构
[1] China Jiliang Univ, Coll Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[2] Prairie View A&M Univ, Dept Comp Sci, Prairie View, TX 77446 USA
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
misclassification cost; cost-sensitive learning; over-sampling; under-sampling;
D O I
10.4028/www.scientific.net/AMR.271-273.1291
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.
引用
收藏
页码:1291 / +
页数:3
相关论文
共 50 条
  • [21] Local cost sensitive learning for handling imbalanced data sets
    Karagiannopoulos, M. G.
    Anyfantis, D. S.
    Kotsiantis, S. B.
    Pintelas, P. E.
    [J]. 2007 MEDITERRANEAN CONFERENCE ON CONTROL & AUTOMATION, VOLS 1-4, 2007, : 235 - 240
  • [22] Applying Adaptive Over-sampling Technique Based on Data Density and Cost-Sensitive SVM to Imbalanced Learning
    Wang, Senzhang
    Li, Zhoujun
    Chao, Wenhan
    Cao, Qinghua
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [23] Cost-Sensitive Ensemble Learning for Highly Imbalanced Classification
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1427 - 1434
  • [24] Privacy-preserving Cost-sensitive Federated Learning from Imbalanced Data
    Liu, Xiaowei
    Yao, Yuanzhi
    Ma, Yuting
    Yu, Nenghai
    [J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 20 - 27
  • [25] COST-SENSITIVE SPFCNN MINER FOR CLASSIFICATION OF IMBALANCED DATA
    Zhao, Linchang
    Shang, Zhaowei
    Zhao, Ling
    Wei, Yu
    Tang, Yuan Yan
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION (ICWAPR), 2019, : 51 - 57
  • [26] Cost-sensitive Fuzzy Multiple Kernel Learning for imbalanced problem
    Wang, Zhe
    Wang, Bolu
    Cheng, Yang
    Li, Dongdong
    Zhang, Jing
    [J]. NEUROCOMPUTING, 2019, 366 : 178 - 193
  • [27] Bayesian Optimization Cost-Sensitive XGBoost Learning Algorithm for Imbalanced Data in Semiconductor Industry
    Shamsudin, Haziqah
    Yusof, Umi Kalsom
    Kashif, Fizza
    Isa, Iza Sazanita
    [J]. JORDAN JOURNAL OF ELECTRICAL ENGINEERING, 2023, 9 (04): : 552 - 565
  • [28] Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift
    Chen, Yingying
    Yang, Xiaowei
    Dai, Hong-Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [29] Cost-Sensitive Variational Autoencoding Classifier for Imbalanced Data Classification
    Liu, Fen
    Qian, Quan
    [J]. ALGORITHMS, 2022, 15 (05)
  • [30] A Statistical Approach to Cost-Sensitive AdaBoost for Imbalanced Data Classification
    Bei, Honghan
    Wang, Yajie
    Ren, Zhaonuo
    Jiang, Shuo
    Li, Keran
    Wang, Wenyang
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021