Imbalanced data optimization combining K-means and SMOTE

被引:1
|
作者
Li W. [1 ]
机构
[1] Hebei Vocational and Technical College of Building Materials, Qinhuangdao
来源
International Journal of Performability Engineering | 2019年 / 15卷 / 08期
关键词
Classification; Imbalanced data; K-Means; Random forest; SMOTE;
D O I
10.23940/ijpe.19.08.p17.21732181
中图分类号
学科分类号
摘要
With the wide application of imbalanced data processing in various fields, such as credit card fraud identification, network intrusion detection, cancer detection, commodity recommendation, software defect prediction, and customer churn prediction, imbalanced data has become one of the current research hotspots. When classifying imbalanced data sets, aiming at the problems of low classification accuracy of negative class samples in the random forest algorithm and marginalization for selecting new samples in the SMOTE algorithm, a new algorithm, KMS_SMOTE, is proposed to deal with imbalanced data sets. In order to avoid the problem of marginalization of new samples, the K-Means algorithm is used to classify the negative class samples to obtain the centroid of the negative class samples, and then the new data set is obtained by selecting the samples near the centroid. Finally, in order to verify the effect of the KMS_SMOTE algorithm, it is compared with the SMOTE algorithm on the data sets of UCI machine learning. The experimental results show that the KMS_SMOTE algorithm effectively improves the classification performance of the random forest algorithm on the imbalanced data set. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:2173 / 2181
页数:8
相关论文
共 50 条
  • [31] Interpretation and optimization of the k-means algorithm
    Sabo, Kristian
    Scitovski, Rudolf
    APPLICATIONS OF MATHEMATICS, 2014, 59 (04) : 391 - 406
  • [32] Rainfall flow optimization based K-Means clustering for medical data
    Jaya Mabel Rani, Antony
    Pravin, Albert
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (17):
  • [33] Combined Elephant Herding Optimization Algorithm with K-means for Data Clustering
    Tuba, Eva
    Dolicanin-Djekic, Diana
    Jovanovic, Raka
    Simian, Dana
    Tuba, Milan
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS, ICTIS 2018, VOL 2, 2019, 107 : 665 - 673
  • [34] Optimization study on k value of K-means algorithm
    Institute of Computer Network System, Hefei University of Technology, Hefei 230009, China
    Xitong Gongcheng Lilum yu Shijian, 2006, 2 (97-101):
  • [35] Optimization of K-Means Algorithm: Ant Colony Optimization
    Reddy, T. Namratha
    Supreethi, K. P.
    2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 530 - 535
  • [36] K-means for Evolving Data Streams
    Bidaurrazaga, Arkaitz
    Perez, Aritz
    Capo, Marco
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1006 - 1011
  • [37] K-means algorithms for functional data
    Lopez Garcia, Maria Luz
    Garcia-Rodenas, Ricardo
    Gonzalez Gomez, Antonia
    NEUROCOMPUTING, 2015, 151 : 231 - 245
  • [38] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [39] Two-step clustering for data reduction combining DBSCAN and k-means clustering
    Kremers, Bart J. J.
    Citrin, Jonathan
    Ho, Aaron
    van der Plassche, Karel L.
    CONTRIBUTIONS TO PLASMA PHYSICS, 2023, 63 (5-6)
  • [40] Combining Parallel Self-Organizing Maps and K-Means to Cluster Distributed Data
    Gorgonio, Flavius L.
    Costa, Jose Alfredo F.
    CSE 2008: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, 2008, : 53 - 58