A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets

被引:4
|
作者
Zhang, Jinwei [1 ]
Lu, Huijuan [1 ]
Chen, Wutao [1 ]
Lu, Yi [2 ]
机构
[1] China Jiliang Univ, Coll Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[2] Prairie View A&M Univ, Dept Comp Sci, Prairie View, TX 77446 USA
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
misclassification cost; cost-sensitive learning; over-sampling; under-sampling;
D O I
10.4028/www.scientific.net/AMR.271-273.1291
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.
引用
收藏
页码:1291 / +
页数:3
相关论文
共 50 条
  • [1] Cost-Sensitive Learning Methods for Imbalanced Data
    Nguyen Thai-Nghe
    Gantner, Zeno
    Schmidt-Thieme, Lars
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [2] Cost-sensitive learning for imbalanced data streams
    Loezer, Lucas
    Enembreck, Fabricio
    Barddal, Jean Paul
    Britto Jr, Alceu de Souza
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 498 - 504
  • [3] Cost-sensitive learning for imbalanced medical data: a review
    Araf, Imane
    Idri, Ali
    Chairi, Ikram
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [4] On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling
    Krawczyk, Bartosz
    Wozniak, Michal
    [J]. COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 180 - 191
  • [5] Cost-sensitive learning for imbalanced medical data: a review
    Imane Araf
    Ali Idri
    Ikram Chairi
    [J]. Artificial Intelligence Review, 57
  • [6] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Aurelio, Yuri Sousa
    de Almeida, Gustavo Matheus
    de Castro, Cristiano Leite
    Braga, Antonio Padua
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3097 - 3114
  • [7] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Yuri Sousa Aurelio
    Gustavo Matheus de Almeida
    Cristiano Leite de Castro
    Antonio Padua Braga
    [J]. Neural Processing Letters, 2022, 54 : 3097 - 3114
  • [8] Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets
    Mohammad Khubeb Siddiqui
    Xiaodi Huang
    Ruben Morales-Menendez
    Nasir Hussain
    Khudeja Khatoon
    [J]. International Journal on Interactive Design and Manufacturing (IJIDeM), 2020, 14 : 1491 - 1509
  • [9] CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets
    Li, Shengzhou
    Nakata, Ayako
    [J]. CHEMISTRY LETTERS, 2024, 53 (05)
  • [10] Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets
    Siddiqui, Mohammad Khubeb
    Huang, Xiaodi
    Morales-Menendez, Ruben
    Hussain, Nasir
    Khatoon, Khudeja
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE DESIGN AND MANUFACTURING - IJIDEM, 2020, 14 (04): : 1491 - 1509