A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data

被引:0
|
作者
Guzman-Ponce, A. [1 ,2 ]
Valdovinos, R. M. [1 ]
Sanchez, J. S. [2 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Toluca, Mexico
[2] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana, Spain
关键词
Class imbalance; DBSCAN; Under-sampling; Noise filtering;
D O I
10.1007/978-3-030-61705-9_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The resampling methods are among the most popular strategies to face the class imbalance problem. The objective of these methods is to compensate the imbalanced class distribution by over-sampling the minority class and/or under-sampling the majority class. In this paper, a new under-sampling method based on the DBSCAN clustering algorithm is introduced. The main idea is to remove the majority class instances that are identified as noise by DBSCAN. The proposed method is empirically compared to well-known state-of-the-art under-sampling algorithms over 25 benchmarking databases and the experimental results demonstrate the effectiveness of the new method in terms of sensitivity, specificity, and geometric mean of individual accuracies.
引用
收藏
页码:299 / 311
页数:13
相关论文
共 50 条
  • [41] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [42] Rethinking Noise Sampling in Class-Imbalanced Diffusion Models
    Xu, Chenghao
    Yan, Jiexi
    Yang, Muli
    Deng, Cheng
    [J]. IEEE Transactions on Image Processing, 2024, 33 : 6298 - 6308
  • [43] An Under-sampling Method Based on Fuzzy Logic for Large Imbalanced Dataset
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1248 - 1252
  • [44] A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data
    Jinyan Li
    Yaoyang Wu
    Simon Fong
    Antonio J. Tallón-Ballesteros
    Xin-she Yang
    Sabah Mohammed
    Feng Wu
    [J]. The Journal of Supercomputing, 2022, 78 : 7428 - 7463
  • [45] An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
    Sun, Zhong-Bin
    Diao, Yu-Xuan
    Ma, Su-Yang
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3392 - 3408
  • [46] A Cluster-based Regrouping Approach for Imbalanced Data Distributions
    Yu, Wen
    Jiang, ShengYi
    [J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [47] A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data
    Li, Jinyan
    Wu, Yaoyang
    Fong, Simon
    Tallon-Ballesteros, Antonio J.
    Yang, Xin-she
    Mohammed, Sabah
    Wu, Feng
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (05): : 7428 - 7463
  • [48] Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning
    Lin, ZhiYong
    Hao, ZhiFeng
    Yang, XiaoWei
    Liu, XiaoLan
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 536 - +
  • [49] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [50] Dynamic financial distress prediction based on class-imbalanced data batches
    Sun, Jie
    Liu, Xin
    Ai, Wenguo
    Tian, Qianyuan
    [J]. INTERNATIONAL JOURNAL OF FINANCIAL ENGINEERING, 2021, 8 (03)