Oversampling Method Based on Gaussian Distribution and K-Means Clustering

被引:6
|
作者
Hassan, Masoud Muhammed [1 ]
Eesa, Adel Sabry [1 ]
Mohammed, Ahmed Jameel [2 ]
Arabo, Wahab Kh [1 ]
机构
[1] Univ Zakho, Dept Comp Sci, Duhok 42001, Kurdistan Regio, Iraq
[2] Duhok Polytech Univ, Dept Informat Technol, Duhok 42001, Kurdistan Regio, Iraq
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 01期
关键词
Class imbalance; oversampling; gaussian; multivariate distribution; k-means clustering; SMOTE;
D O I
10.32604/cmc.2021.018280
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance in recent years. When the class distribution is imbalanced, classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority. Therefore, the accuracy may be high, but the model cannot recognize data instances in the minority class to classify them, leading to many misclassifications. Different methods have been proposed in the literature to handle the imbalance problem, but most are complicated and tend to simulate unnecessary noise. In this paper, we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering, called GK-Means. The new method aims to avoid generating noise and control imbalances between and within classes. Various experiments have been carried out with six classifiers and four oversampling methods. Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.
引用
收藏
页码:451 / 469
页数:19
相关论文
共 50 条
  • [21] GWO optimized k-means cluster based Oversampling Algorithm
    Subbulaxmi, Santha S.
    Arumugam, G.
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (03): : 343 - 355
  • [22] A deterministic method for initializing K-means clustering
    Su, T
    Dy, J
    [J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 784 - 786
  • [23] A novel method for K-means clustering algorithm
    [J]. Zhao, Jinguo, 1600, Transport and Telecommunication Institute, Lomonosova street 1, Riga, LV-1019, Latvia (18):
  • [24] A neighborhood granular K-means clustering method
    Chen, Yu-Ming
    Cai, Guo-Qiang
    Lu, Jun-Wen
    Zeng, Nian-Feng
    [J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (03): : 857 - 864
  • [25] A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem
    An, Chunsheng
    Sun, Jingtong
    Wang, Yifeng
    Wei, Qingjie
    [J]. 2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 883 - 887
  • [26] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [27] Clustering method of time series based on EMD and K-means algorithm
    Liu, Hui-Ting
    Ni, Zhi-Wei
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (05): : 803 - 808
  • [28] Unsupervised Image Segmentation Method based on Finite Generalized Gaussian Distribution with EM & K-Means Algorithm
    Reddy, Prasad P. V. G. D.
    Rao, Srinivas K.
    Yarramalle, Srinivas
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (04): : 317 - 321
  • [29] Cleaning RFID data streams based on K-means clustering method
    Lin Qiaomin
    Fa Anqi
    Pan Min
    Xie Qiang
    Du Kun
    Sheng Michael
    [J]. The Journal of China Universities of Posts and Telecommunications, 2020, 27 (02) : 72 - 81
  • [30] A Missing Data Complement Method Based on K-means Clustering Analysis
    Shi, Pengjia
    Zhang, Linyao
    [J]. 2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,