Hybrid sampling for imbalanced data

被引:48
|
作者
Seiffert, Chris [1 ]
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Data Min & Machine Learning Lab, Boca Raton, FL 33431 USA
关键词
Class imbalance; classification; sampling; binary classification; hybrid sampling; SMOTE;
D O I
10.3233/ICA-2009-0314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building a classification model on imbalanced datasets can be a challenging endeavor. Models built on data where examples of one class are greatly outnumbered by examples of the other class(es) tend to sacrifice accuracy with respect to the underrepresented class in favor of maximizing the overall classification rate. Several methods have been suggested to alleviate the problem of class imbalance. One common technique that has received much attention in recent research is data sampling. Data sampling either adds examples to the minority class (oversampling) or removes examples from the majority class (undersampling) in order to create a more balanced data set. Both oversampling and undersampling have their strengths and drawbacks. In this work we propose a hybrid sampling procedure that uses a combination of two sampling techniques to create a balanced data set. By using more than one sampling technique, we can combine the strengths of the individual techniques while lessening the drawbacks. We perform a comprehensive set of experiments, with more than one million classifiers built, showing that our hybrid sampling procedure almost always outperforms the individual sampling techniques.
引用
收藏
页码:193 / 210
页数:18
相关论文
共 50 条
  • [1] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [2] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    [J]. 2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
  • [3] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
    Wang, Qiang
    [J]. ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [4] CLUS: A New Hybrid Sampling Classification for Imbalanced Data
    Prachuabsupakij, Wanthanee
    [J]. PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 281 - 286
  • [5] Optimized hybrid imbalanced data sampling for decision tree training
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Michal
    [J]. PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 339 - 342
  • [6] Learning From Imbalanced Data With Deep Density Hybrid Sampling
    Liu, Chien-Liang
    Chang, Yu-Hua
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (11): : 7065 - 7077
  • [7] Hybrid probabilistic sampling with random subspace for imbalanced data learning
    Cao, Peng
    Zhao, Dazhe
    Zaiane, Osmar
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1089 - 1108
  • [8] Exploratory parallel hybrid sampling framework for imbalanced data classification
    Big Data School, Yunnan Agricultural University, Kunming
    650201, China
    不详
    650201, China
    不详
    241002, China
    不详
    241002, China
    [J]. Eng Appl Artif Intell, 2024,
  • [9] Hybrid Sampling Method for Overlap Region of ICS Imbalanced Data
    Gao, Bing
    Gu, Zhaojun
    Zhou, Jingxian
    Sui, He
    [J]. Computer Engineering and Applications, 2023, 59 (19) : 305 - 315
  • [10] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
    Chen, Liping
    Jiang, Jiabao
    Zhang, Yong
    [J]. COMPLEXITY, 2021, 2021