Imbalanced data optimization combining K-means and SMOTE

被引:1
|
作者
Li W. [1 ]
机构
[1] Hebei Vocational and Technical College of Building Materials, Qinhuangdao
来源
International Journal of Performability Engineering | 2019年 / 15卷 / 08期
关键词
Classification; Imbalanced data; K-Means; Random forest; SMOTE;
D O I
10.23940/ijpe.19.08.p17.21732181
中图分类号
学科分类号
摘要
With the wide application of imbalanced data processing in various fields, such as credit card fraud identification, network intrusion detection, cancer detection, commodity recommendation, software defect prediction, and customer churn prediction, imbalanced data has become one of the current research hotspots. When classifying imbalanced data sets, aiming at the problems of low classification accuracy of negative class samples in the random forest algorithm and marginalization for selecting new samples in the SMOTE algorithm, a new algorithm, KMS_SMOTE, is proposed to deal with imbalanced data sets. In order to avoid the problem of marginalization of new samples, the K-Means algorithm is used to classify the negative class samples to obtain the centroid of the negative class samples, and then the new data set is obtained by selecting the samples near the centroid. Finally, in order to verify the effect of the KMS_SMOTE algorithm, it is compared with the SMOTE algorithm on the data sets of UCI machine learning. The experimental results show that the KMS_SMOTE algorithm effectively improves the classification performance of the random forest algorithm on the imbalanced data set. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:2173 / 2181
页数:8
相关论文
共 50 条
  • [21] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [22] Class Imbalanced Fault Diagnosis via Combining K-Means Clustering Algorithm with Generative Adversarial Networks
    Li, Huifang
    Fan, Rui
    Shi, Qisong
    Du, Zijian
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 346 - 355
  • [23] K′ times k-means logistic regression algorithm for imbalanced classification
    Zhang, Yanfeng
    Wang, Lichun
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (09) : 4252 - 4259
  • [24] Clustering Algorithm Combining CPSO with K-Means
    Gu, Chunqin
    Tao, Qian
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 : 749 - 755
  • [25] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [26] Adapting K-Means Algorithm for Pair-Wise Constrained Clustering of Imbalanced Data Streams
    Wojciechowski, Szymon
    Gonzalez-Almagro, German
    Garcia, Salvador
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022, 2022, 13469 : 153 - 163
  • [27] Interpretation and optimization of the k-means algorithm
    Kristian Sabo
    Rudolf Scitovski
    Applications of Mathematics, 2014, 59 : 391 - 406
  • [28] Classifying Imbalanced Data using an Svm Ensemble with k-means Clustering in Semiconductor TEST Process
    Park, Eun-mi
    Lee, Jee-hyOng
    SIXTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2013), 2013, 9067
  • [29] A K-means triangular synthesis large margin classifier with unified pinball loss for imbalanced data
    Shao, Danlin
    Dai, Yixi
    Li, Junjie
    Li, Shenglin
    Chen, Rui
    APPLIED SOFT COMPUTING, 2024, 167
  • [30] Manifold optimization for k-means clustering
    Carson, Timothy
    Mixon, Dustin G.
    Villar, Soledad
    Ward, Rachel
    2017 INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2017, : 73 - 77