FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

被引:1
|
作者
Maras, Abdullah [1 ]
Selcukcan Erol, Cigdem [1 ,2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkiye
[2] Istanbul Univ, Informat Dept, Istanbul, Turkiye
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkiye
关键词
Binary classification; imbalanced datasets; machine learning; sampling; fuzzy c-means; DATA-SETS; CLASSIFICATION; SMOTE; PREDICTION; ALGORITHM;
D O I
10.55730/1300-0632.4044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented stateof-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.
引用
收藏
页码:1223 / 1236
页数:15
相关论文
共 50 条
  • [1] Differentially Private Fuzzy C-Means Clustering Algorithms for Fuzzy Datasets
    Shakiba, Ali
    2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 91 - 93
  • [2] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,
  • [3] Construction of EBRB classifier for imbalanced data based on Fuzzy C-Means clustering
    Fu, Yang-Geng
    Ye, Ji-Feng
    Yin, Ze-Feng
    Chen, Long-Jiang
    Wang, Ying-Ming
    Liu, Geng-Geng
    KNOWLEDGE-BASED SYSTEMS, 2021, 234
  • [4] A novel intuitionistic fuzzy rough instance selection and attribute reduction with kernelized intuitionistic fuzzy C-means clustering to handle imbalanced datasets
    Tiwari, Anoop Kumar
    Nath, Abhigyan
    Pandey, Rakesh Kumar
    Maratha, Priti
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [5] OPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM
    Mehdizadeh, E.
    Sadi-Nezhad, S.
    Tavakkoli-Moghaddam, R.
    IRANIAN JOURNAL OF FUZZY SYSTEMS, 2008, 5 (03): : 1 - 14
  • [6] Clustering large amounts of healthcare datasets using fuzzy c-means algorithm
    Reddy, B. Ramakantha
    Kumar, Y. Vijay
    Prabhakar, M.
    2019 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2019, : 93 - 97
  • [7] Fuzzy c-means for fuzzy hierarchical clustering
    Vicenc, T
    FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 646 - 651
  • [8] Mixed fuzzy C-means clustering
    Demirhan, Haydar
    INFORMATION SCIENCES, 2025, 690
  • [9] On Tolerant Fuzzy c-Means Clustering
    Hamasuna, Yukihiro
    Endo, Yasunori
    Miyamoto, Sadaaki
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2009, 13 (04) : 421 - 428
  • [10] Fuzzy Clustering Using Hybrid Fuzzy c-means and Fuzzy Particle Swarm Optimization
    Izakian, Hesam
    Abraham, Ajith
    Snasel, Vaclav
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 1689 - +