FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [41] A Hybrid Recommendation System based on Fuzzy C-Means Clustering and Supervised Learning
    Li Duan
    Wang, Weiping
    Ha, Baijing
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (07) : 2399 - 2413
  • [42] Comparison of AIS and fuzzy c-means clustering methods on the classification of breast cancer and diabetes datasets
    Ozsen, Seral
    Ceylan, Rahime
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (05) : 1241 - 1254
  • [43] Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms
    Perez-Ortega, Joaquin
    Silvia Roblero-Aguilar, Sandra
    Nely Almanza-Ortega, Nelva
    Frausto Solis, Juan
    Zavala-Diaz, Crispin
    Hernandez, Yasmin
    Landero-Najera, Vanesa
    AXIOMS, 2022, 11 (08)
  • [44] On tolerant fuzzy c-means clustering and tolerant possibilistic clustering
    Hamasuna, Yukihiro
    Endo, Yasunori
    Miyamoto, Sadaaki
    SOFT COMPUTING, 2010, 14 (05) : 487 - 494
  • [45] Study on combining subtractive clustering with fuzzy c-means clustering
    Liu, WY
    Xiao, CJ
    Wang, BW
    Shi, Y
    Fang, SF
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2659 - 2662
  • [46] On tolerant fuzzy c-means clustering and tolerant possibilistic clustering
    Yukihiro Hamasuna
    Yasunori Endo
    Sadaaki Miyamoto
    Soft Computing, 2010, 14 : 487 - 494
  • [47] Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization
    Silva Filho, Telmo M.
    Pimentel, Bruno A.
    Souza, Renata M. C. R.
    Oliveira, Adriano L. I.
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (17-18) : 6315 - 6328
  • [48] A new hybrid c-means clustering model
    Pal, NR
    Pal, K
    Keller, JM
    Bezdek, JC
    2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 2004, : 179 - 184
  • [49] Measuring the congruence of fuzzy partitions in fuzzy c-means clustering
    Suleman, Abdul
    APPLIED SOFT COMPUTING, 2017, 52 : 1285 - 1295
  • [50] A weighted fuzzy c-means clustering model for fuzzy data
    D'Urso, P
    Giordani, P
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (06) : 1496 - 1523