Imbalanced data optimization combining K-means and SMOTE

被引：1

作者：

Li W. ^{[1
]}

机构：

[1] Hebei Vocational and Technical College of Building Materials, Qinhuangdao

来源：

International Journal of Performability Engineering | 2019年 / 15卷 / 08期

关键词：

Classification; Imbalanced data; K-Means; Random forest; SMOTE;

D O I：

10.23940/ijpe.19.08.p17.21732181

中图分类号：

学科分类号：

摘要：

With the wide application of imbalanced data processing in various fields, such as credit card fraud identification, network intrusion detection, cancer detection, commodity recommendation, software defect prediction, and customer churn prediction, imbalanced data has become one of the current research hotspots. When classifying imbalanced data sets, aiming at the problems of low classification accuracy of negative class samples in the random forest algorithm and marginalization for selecting new samples in the SMOTE algorithm, a new algorithm, KMS_SMOTE, is proposed to deal with imbalanced data sets. In order to avoid the problem of marginalization of new samples, the K-Means algorithm is used to classify the negative class samples to obtain the centroid of the negative class samples, and then the new data set is obtained by selecting the samples near the centroid. Finally, in order to verify the effect of the KMS_SMOTE algorithm, it is compared with the SMOTE algorithm on the data sets of UCI machine learning. The experimental results show that the KMS_SMOTE algorithm effectively improves the classification performance of the random forest algorithm on the imbalanced data set. © 2019 Totem Publisher, Inc. All rights reserved.

引用

页码：2173 / 2181

页数：8

共 50 条

[21] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[22] Class Imbalanced Fault Diagnosis via Combining K-Means Clustering Algorithm with Generative Adversarial Networks
Li, Huifang
Fan, Rui
Shi, Qisong
Du, Zijian
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 346 - 355
[23] K′ times k-means logistic regression algorithm for imbalanced classification
Zhang, Yanfeng
Wang, Lichun
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (09) : 4252 - 4259
[24] Clustering Algorithm Combining CPSO with K-Means
Gu, Chunqin
Tao, Qian
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 : 749 - 755
[25] Soil data clustering by using K-means and fuzzy K-means algorithm
Hot, Elma
Popovic-Bugarin, Vesna
2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
[26] Adapting K-Means Algorithm for Pair-Wise Constrained Clustering of Imbalanced Data Streams
Wojciechowski, Szymon
Gonzalez-Almagro, German
Garcia, Salvador
Wozniak, Michal
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022, 2022, 13469 : 153 - 163
[27] Interpretation and optimization of the k-means algorithm
Kristian Sabo
Rudolf Scitovski
Applications of Mathematics, 2014, 59 : 391 - 406
[28] Classifying Imbalanced Data using an Svm Ensemble with k-means Clustering in Semiconductor TEST Process
Park, Eun-mi
Lee, Jee-hyOng
SIXTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2013), 2013, 9067
[29] A K-means triangular synthesis large margin classifier with unified pinball loss for imbalanced data
Shao, Danlin
Dai, Yixi
Li, Junjie
Li, Shenglin
Chen, Rui
APPLIED SOFT COMPUTING, 2024, 167
[30] Manifold optimization for k-means clustering
Carson, Timothy
Mixon, Dustin G.
Villar, Soledad
Ward, Rachel
2017 INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2017, : 73 - 77

← 1 2 3 4 5 →