A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets

被引:4
|
作者
Zhang, Jinwei [1 ]
Lu, Huijuan [1 ]
Chen, Wutao [1 ]
Lu, Yi [2 ]
机构
[1] China Jiliang Univ, Coll Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[2] Prairie View A&M Univ, Dept Comp Sci, Prairie View, TX 77446 USA
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
misclassification cost; cost-sensitive learning; over-sampling; under-sampling;
D O I
10.4028/www.scientific.net/AMR.271-273.1291
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.
引用
收藏
页码:1291 / +
页数:3
相关论文
共 50 条
  • [41] Large cost-sensitive margin distribution machine for imbalanced data classification
    Cheng, Fanyong
    Zhang, Jing
    Wen, Cuihong
    Liu, Zhaohua
    Li, Zuoyong
    [J]. NEUROCOMPUTING, 2017, 224 : 45 - 57
  • [42] Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data
    Cheng, Fanyong
    Zhang, Jing
    Wen, Cuihong
    [J]. PATTERN RECOGNITION LETTERS, 2016, 80 : 107 - 112
  • [43] Cost-Sensitive Latent Space Learning for Imbalanced PolSAR Image Classification
    Wu, Qian
    Hou, Biao
    Wen, Zaidao
    Ren, Zhongle
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 4802 - 4817
  • [44] Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
    Shi, Donghui
    Guan, Jian
    Zurada, Jozef
    [J]. 2015 ASIA-PACIFIC CONFERENCE ON COMPUTER-AIDED SYSTEM ENGINEERING - APCASE 2015, 2015, : 30 - 35
  • [45] Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets
    Li, Fenglian
    Zhang, Xueying
    Zhang, Xiqian
    Du, Chunlei
    Xu, Yue
    Tian, Yu-Chu
    [J]. INFORMATION SCIENCES, 2018, 422 : 242 - 256
  • [46] Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams
    Krawczyk, Bartosz
    Skryjomski, Przemyslaw
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II, 2017, 10535 : 512 - 527
  • [47] Cost-Sensitive Active Learning for Incomplete Data
    Wang, Min
    Yang, Chunyu
    Zhao, Fei
    Min, Fan
    Wang, Xizhao
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (01): : 405 - 416
  • [48] Cost-Sensitive Learning
    Zhou, Zlii-Hua
    [J]. MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2011, 2011, 6820 : 17 - 18
  • [49] Cost-sensitive active learning through statistical methods
    Wang, Min
    Lin, Yao
    Min, Fan
    Liu, Dun
    [J]. INFORMATION SCIENCES, 2019, 501 : 460 - 482
  • [50] Improved cost-sensitive representation of data for solving the imbalanced big data classification problem
    Fattahi, Mahboubeh
    Moattar, Mohammad Hossein
    Forghani, Yahya
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)