Oversampling Method Based on Gaussian Distribution and K-Means Clustering

被引:6
|
作者
Hassan, Masoud Muhammed [1 ]
Eesa, Adel Sabry [1 ]
Mohammed, Ahmed Jameel [2 ]
Arabo, Wahab Kh [1 ]
机构
[1] Univ Zakho, Dept Comp Sci, Duhok 42001, Kurdistan Regio, Iraq
[2] Duhok Polytech Univ, Dept Informat Technol, Duhok 42001, Kurdistan Regio, Iraq
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 01期
关键词
Class imbalance; oversampling; gaussian; multivariate distribution; k-means clustering; SMOTE;
D O I
10.32604/cmc.2021.018280
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance in recent years. When the class distribution is imbalanced, classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority. Therefore, the accuracy may be high, but the model cannot recognize data instances in the minority class to classify them, leading to many misclassifications. Different methods have been proposed in the literature to handle the imbalance problem, but most are complicated and tend to simulate unnecessary noise. In this paper, we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering, called GK-Means. The new method aims to avoid generating noise and control imbalances between and within classes. Various experiments have been carried out with six classifiers and four oversampling methods. Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.
引用
收藏
页码:451 / 469
页数:19
相关论文
共 50 条
  • [1] A Clustering Method Based on K-Means Algorithm
    Li, Youguo
    Wu, Haiyan
    [J]. INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1104 - 1109
  • [2] A K-means Clustering Algorithm Based on the Distribution of SIFT
    Lv, Hui
    Huang, Xianglin
    Yang, Lifang
    Liu, Tao
    Wang, Ping
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1301 - 1304
  • [3] K-means based method for overlapping document clustering
    Beltran, Beatriz
    Vilarino, Darnes
    Martinez-Trinidad, Jose Fco.
    Carrasco-Ochoa, J. A.
    Pinto, David
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2127 - 2135
  • [4] An Improved Method for K-Means Clustering
    Cui, Xiaowei
    Wang, Fuxiang
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 756 - 759
  • [5] Faster Mahalanobis K-Means Clustering for Gaussian Distributions
    Chokniwal, Ankita
    Singh, Manoj
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 947 - 952
  • [6] A Consistency Evaluation Method of Pavement Performance Based on K-Means Clustering and Cumulative Distribution
    Ye, Wenya
    Zhang, Rui
    Yang, Qun
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [7] An optimisation method of factory terminal logistics distribution route based on K-means clustering
    Zhang, Hui
    [J]. International Journal of Manufacturing Technology and Management, 2023, 37 (02) : 184 - 198
  • [8] Implementation of K-Means Clustering Method to Distribution of High School Teachers
    Widiyaningtyas, Triyanna
    Prabowo, Martin Indra Wisnu
    Pratama, M. Ardhika Mulya
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTER SCIENCE AND INFORMATICS (EECSI), 2017, : 49 - 54
  • [9] An Improved Fractal Coding Method based on K-means Clustering
    Guo, Hui
    He, Jie
    [J]. Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering (MMME 2016), 2016, 79 : 294 - 300
  • [10] Identification of electricity theft based on the k-means clustering method
    Lin, Qian
    Li, Mingming
    Feng, Shuhui
    Yang, Jingjing
    Sun, Xiaopeng
    Li, Jiangtao
    Wang, Zhiyuan
    Zhang, Jinghui
    Xie, Xiangmin
    [J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON POWER ELECTRONICS SYSTEMS AND APPLICATIONS, PESA, 2022,