Oversampling Method Based on Gaussian Distribution and K-Means Clustering

被引：6

作者：

Hassan, Masoud Muhammed ^{[1
]}

Eesa, Adel Sabry ^{[1
]}

Mohammed, Ahmed Jameel ^{[2
]}

Arabo, Wahab Kh ^{[1
]}

机构：

[1] Univ Zakho, Dept Comp Sci, Duhok 42001, Kurdistan Regio, Iraq

[2] Duhok Polytech Univ, Dept Informat Technol, Duhok 42001, Kurdistan Regio, Iraq

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 01期

关键词：

Class imbalance; oversampling; gaussian; multivariate distribution; k-means clustering; SMOTE;

D O I：

10.32604/cmc.2021.018280

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance in recent years. When the class distribution is imbalanced, classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority. Therefore, the accuracy may be high, but the model cannot recognize data instances in the minority class to classify them, leading to many misclassifications. Different methods have been proposed in the literature to handle the imbalance problem, but most are complicated and tend to simulate unnecessary noise. In this paper, we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering, called GK-Means. The new method aims to avoid generating noise and control imbalances between and within classes. Various experiments have been carried out with six classifiers and four oversampling methods. Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.

引用

页码：451 / 469

页数：19

共 50 条

[21] GWO optimized k-means cluster based Oversampling Algorithm
Subbulaxmi, Santha S.
Arumugam, G.
[J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (03): : 343 - 355
[22] A deterministic method for initializing K-means clustering
Su, T
Dy, J
[J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 784 - 786
[23] A novel method for K-means clustering algorithm
[J]. Zhao, Jinguo, 1600, Transport and Telecommunication Institute, Lomonosova street 1, Riga, LV-1019, Latvia (18):
[24] A neighborhood granular K-means clustering method
Chen, Yu-Ming
Cai, Guo-Qiang
Lu, Jun-Wen
Zeng, Nian-Feng
[J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (03): : 857 - 864
[25] A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem
An, Chunsheng
Sun, Jingtong
Wang, Yifeng
Wei, Qingjie
[J]. 2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 883 - 887
[26] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
Shi Na
Liu Xumin
Guan Yong
[J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
[27] Clustering method of time series based on EMD and K-means algorithm
Liu, Hui-Ting
Ni, Zhi-Wei
[J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (05): : 803 - 808
[28] Unsupervised Image Segmentation Method based on Finite Generalized Gaussian Distribution with EM & K-Means Algorithm
Reddy, Prasad P. V. G. D.
Rao, Srinivas K.
Yarramalle, Srinivas
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (04): : 317 - 321
[29] Cleaning RFID data streams based on K-means clustering method
Lin Qiaomin
Fa Anqi
Pan Min
Xie Qiang
Du Kun
Sheng Michael
[J]. The Journal of China Universities of Posts and Telecommunications, 2020, 27 (02) : 72 - 81
[30] A Missing Data Complement Method Based on K-means Clustering Analysis
Shi, Pengjia
Zhang, Linyao
[J]. 2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,

← 1 2 3 4 5 →