On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling

被引:0
|
作者
Krawczyk, Bartosz [1 ]
Wozniak, Michal [2 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA USA
[2] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, Wroclaw, Poland
来源
关键词
Machine learning; Imbalanced data; Cost-sensitive learning; Data preprocessing; Oversampling; SMOTE; CLASSIFICATION; SYSTEMS;
D O I
10.1007/978-3-030-22744-9_14
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Learning from imbalanced data is still considered as one of the most challenging areas of machine learning. Among plethora of methods dedicated to alleviating the challenge of skewed distributions, two most distinct ones are data-level sampling and cost-sensitive learning. The former modifies the training set by either removing majority instances or generating additional minority ones. The latter associates a penalty cost with the minority class, in order to mitigate the classifiers' bias towards the better represented class. While these two approaches have been extensively studied on their own, no works so far have tried to combine their properties. Such a direction seems as highly promising, as in many real-life imbalanced problems we may obtain the actual misclassification cost and thus it should be embedded in the classification framework, regardless of the selected algorithm. This work aims to open a new direction for learning from imbalanced data, by investigating an interplay between the oversampling and cost-sensitive approaches. We show that there is a direct relationship between the misclassification cost imposed on the minority class and the oversampling ratios that aim to balance both classes. This becomes vivid when popular skew-insensitive metrics are modified to incorporate the cost-sensitive element. Our experimental study clearly shows a strong relationship between sampling and cost, indicating that this new direction should be pursued in the future in order to develop new and effective algorithms for imbalanced data.
引用
收藏
页码:180 / 191
页数:12
相关论文
共 50 条
  • [1] Cost-sensitive learning for imbalanced data streams
    Loezer, Lucas
    Enembreck, Fabricio
    Barddal, Jean Paul
    Britto Jr, Alceu de Souza
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 498 - 504
  • [2] Cost-Sensitive Learning Methods for Imbalanced Data
    Nguyen Thai-Nghe
    Gantner, Zeno
    Schmidt-Thieme, Lars
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [3] Cost-sensitive learning for imbalanced medical data: a review
    Araf, Imane
    Idri, Ali
    Chairi, Ikram
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [4] Cost-sensitive learning for imbalanced medical data: a review
    Imane Araf
    Ali Idri
    Ikram Chairi
    [J]. Artificial Intelligence Review, 57
  • [5] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Aurelio, Yuri Sousa
    de Almeida, Gustavo Matheus
    de Castro, Cristiano Leite
    Braga, Antonio Padua
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3097 - 3114
  • [6] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Yuri Sousa Aurelio
    Gustavo Matheus de Almeida
    Cristiano Leite de Castro
    Antonio Padua Braga
    [J]. Neural Processing Letters, 2022, 54 : 3097 - 3114
  • [7] Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data
    Khan, Salman H.
    Hayat, Munawar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    Togneri, Roberto
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3573 - 3587
  • [8] A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data
    Braytee, Ali
    Liu, Wei
    Kennedy, Paul
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 78 - 86
  • [9] Cost-sensitive boosting for classification of imbalanced data
    Sun, Yamnin
    Kamel, Mohamed S.
    Wong, Andrew K. C.
    Wang, Yang
    [J]. PATTERN RECOGNITION, 2007, 40 (12) : 3358 - 3378
  • [10] Cost-Sensitive Supported Vector Learning to Rank Imbalanced Data Set
    Chang, Xiao
    Zheng, Qinghua
    Lin, Peng
    [J]. EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2009, 5755 : 305 - 314