Hybrid sampling for imbalanced data

被引:49
|
作者
Seiffert, Chris [1 ]
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Data Min & Machine Learning Lab, Boca Raton, FL 33431 USA
关键词
Class imbalance; classification; sampling; binary classification; hybrid sampling; SMOTE;
D O I
10.3233/ICA-2009-0314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building a classification model on imbalanced datasets can be a challenging endeavor. Models built on data where examples of one class are greatly outnumbered by examples of the other class(es) tend to sacrifice accuracy with respect to the underrepresented class in favor of maximizing the overall classification rate. Several methods have been suggested to alleviate the problem of class imbalance. One common technique that has received much attention in recent research is data sampling. Data sampling either adds examples to the minority class (oversampling) or removes examples from the majority class (undersampling) in order to create a more balanced data set. Both oversampling and undersampling have their strengths and drawbacks. In this work we propose a hybrid sampling procedure that uses a combination of two sampling techniques to create a balanced data set. By using more than one sampling technique, we can combine the strengths of the individual techniques while lessening the drawbacks. We perform a comprehensive set of experiments, with more than one million classifiers built, showing that our hybrid sampling procedure almost always outperforms the individual sampling techniques.
引用
收藏
页码:193 / 210
页数:18
相关论文
共 50 条
  • [1] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [2] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
  • [3] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
    Wang, Qiang
    ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [4] CLUS: A New Hybrid Sampling Classification for Imbalanced Data
    Prachuabsupakij, Wanthanee
    PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 281 - 286
  • [5] Optimized hybrid imbalanced data sampling for decision tree training
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Michal
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 339 - 342
  • [6] Learning From Imbalanced Data With Deep Density Hybrid Sampling
    Liu, Chien-Liang
    Chang, Yu-Hua
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (11): : 7065 - 7077
  • [7] Hybrid probabilistic sampling with random subspace for imbalanced data learning
    Cao, Peng
    Zhao, Dazhe
    Zaiane, Osmar
    INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1089 - 1108
  • [8] Hybrid Sampling Method for Overlap Region of ICS Imbalanced Data
    Gao, Bing
    Gu, Zhaojun
    Zhou, Jingxian
    Sui, He
    Computer Engineering and Applications, 2023, 59 (19) : 305 - 315
  • [9] Exploratory parallel hybrid sampling framework for imbalanced data classification
    Zheng, Ming
    Zhao, Zhuo
    Wang, Fei
    Hu, Xiaowen
    Xu, Sheng
    Li, Wanggen
    Li, Tong
    Engineering Applications of Artificial Intelligence, 2024, 138
  • [10] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
    Chen, Liping
    Jiang, Jiabao
    Zhang, Yong
    COMPLEXITY, 2021, 2021