A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data

被引:0
|
作者
Guzman-Ponce, A. [1 ,2 ]
Valdovinos, R. M. [1 ]
Sanchez, J. S. [2 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Toluca, Mexico
[2] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana, Spain
关键词
Class imbalance; DBSCAN; Under-sampling; Noise filtering;
D O I
10.1007/978-3-030-61705-9_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The resampling methods are among the most popular strategies to face the class imbalance problem. The objective of these methods is to compensate the imbalanced class distribution by over-sampling the minority class and/or under-sampling the majority class. In this paper, a new under-sampling method based on the DBSCAN clustering algorithm is introduced. The main idea is to remove the majority class instances that are identified as noise by DBSCAN. The proposed method is empirically compared to well-known state-of-the-art under-sampling algorithms over 25 benchmarking databases and the experimental results demonstrate the effectiveness of the new method in terms of sensitivity, specificity, and geometric mean of individual accuracies.
引用
收藏
页码:299 / 311
页数:13
相关论文
共 50 条
  • [1] Cluster-based under-sampling approaches for imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5718 - 5727
  • [2] CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
    Rayhan, Farshid
    Ahmed, Sajid
    Mahbub, Asif
    Jani, Md. Rafsan
    Shatabda, Swakkhar
    Farid, Dewan Md.
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 70 - 75
  • [3] Cluster-based Under-sampling with Random Forest for Multi-Class Imbalanced Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Farid, Dewan Md.
    [J]. 2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [4] Cluster-based Majority Under-Sampling Approaches for Class Imbalance Learning
    Zhang, Yan-Ping
    Zhang, Li-Na
    Wang, Yong-Cheng
    [J]. 2010 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND FINANCIAL ENGINEERING (ICIFE), 2010, : 400 - 404
  • [5] Feature Selection and Ensemble Hierarchical Cluster-based Under-sampling Approach for Extremely Imbalanced Datasets
    Soltani, Sima
    Sadri, Javad
    Torshizi, Hassan Ahmadi
    [J]. 2011 1ST INTERNATIONAL ECONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2011, : 166 - 171
  • [6] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [7] Automatic incident detection algorithm based on under-sampling for imbalanced traffic data
    Li, Miao-hua
    Chen, Shu-yan
    Lao, Ye-chun
    [J]. GREEN BUILDING, ENVIRONMENT, ENERGY AND CIVIL ENGINEERING, 2017, : 145 - 150
  • [8] SVM classifier for unbalanced data based on spectrum cluster-based under-sampling approaches
    Tao, Xin-Min
    Zhang, Dong-Xue
    Hao, Si-Yuan
    Fu, Dan-Dan
    [J]. Kongzhi yu Juece/Control and Decision, 2012, 27 (12): : 1761 - 1768
  • [9] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436
  • [10] An Improved Under-sampling Imbalanced Classification Algorithm
    Yao, Baofeng
    Wang, Lei
    [J]. 2021 13TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2021), 2021, : 775 - 779