FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [21] Fuzzy Clustering Using C-Means Method
    Krastev, Georgi
    Georgiev, Tsvetozar
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2015, 4 (02): : 144 - 148
  • [22] Ensemble Clustering via Fuzzy c-Means
    Wan, Xin
    Lin, Hao
    Li, Hong
    Liu, Guannan
    An, Maobo
    2017 14TH INTERNATIONAL CONFERENCE ON SERVICES SYSTEMS AND SERVICES MANAGEMENT (ICSSSM), 2017,
  • [23] A novel fuzzy C-means clustering algorithm
    Li, Cuixia
    Yu, Jian
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 510 - 515
  • [24] The global Fuzzy C-Means clustering algorithm
    Wang, Weina
    Zhang, Yunjie
    Li, Yi
    Zhang, Xiaona
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 3604 - +
  • [25] On Fuzzy c-Means and Membership Based Clustering
    Torra, Vicenc
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I (IWANN 2015), 2015, 9094 : 597 - 607
  • [26] Intuitionistic fuzzy C-means clustering algorithms
    Xu, Zeshui
    Wu, Junjie
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2010, 21 (04) : 580 - 590
  • [27] Gaussian Collaborative Fuzzy C-Means Clustering
    Yunlong Gao
    Zhihao Wang
    Huidui Li
    Jinyan Pan
    International Journal of Fuzzy Systems, 2021, 23 : 2218 - 2234
  • [28] Fuzzy c-means clustering of incomplete data
    Hathaway, RJ
    Bezdek, JC
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (05): : 735 - 744
  • [29] An Accelerated Fuzzy C-Means clustering algorithm
    Hershfinkel, D
    Dinstein, I
    APPLICATIONS OF FUZZY LOGIC TECHNOLOGY III, 1996, 2761 : 41 - 52
  • [30] Novel possibilistic fuzzy c-means clustering
    School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
    不详
    Tien Tzu Hsueh Pao, 2008, 10 (1996-2000):