Imbalanced data optimization combining K-means and SMOTE

被引:1
|
作者
Li W. [1 ]
机构
[1] Hebei Vocational and Technical College of Building Materials, Qinhuangdao
来源
International Journal of Performability Engineering | 2019年 / 15卷 / 08期
关键词
Classification; Imbalanced data; K-Means; Random forest; SMOTE;
D O I
10.23940/ijpe.19.08.p17.21732181
中图分类号
学科分类号
摘要
With the wide application of imbalanced data processing in various fields, such as credit card fraud identification, network intrusion detection, cancer detection, commodity recommendation, software defect prediction, and customer churn prediction, imbalanced data has become one of the current research hotspots. When classifying imbalanced data sets, aiming at the problems of low classification accuracy of negative class samples in the random forest algorithm and marginalization for selecting new samples in the SMOTE algorithm, a new algorithm, KMS_SMOTE, is proposed to deal with imbalanced data sets. In order to avoid the problem of marginalization of new samples, the K-Means algorithm is used to classify the negative class samples to obtain the centroid of the negative class samples, and then the new data set is obtained by selecting the samples near the centroid. Finally, in order to verify the effect of the KMS_SMOTE algorithm, it is compared with the SMOTE algorithm on the data sets of UCI machine learning. The experimental results show that the KMS_SMOTE algorithm effectively improves the classification performance of the random forest algorithm on the imbalanced data set. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:2173 / 2181
页数:8
相关论文
共 50 条
  • [41] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [42] Kernel K-means for categorical data
    Couto, J
    ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, 2005, 3646 : 46 - 56
  • [43] KmL: k-means for longitudinal data
    Christophe Genolini
    Bruno Falissard
    Computational Statistics, 2010, 25 : 317 - 328
  • [44] KmL: k-means for longitudinal data
    Genolini, Christophe
    Falissard, Bruno
    COMPUTATIONAL STATISTICS, 2010, 25 (02) : 317 - 328
  • [45] Combining Multi-Layer Perceptron and K-means for data clustering with background knowledge
    Guan, Donghai
    Yuan, Weiwei
    Lee, Young-Koo
    Gavrilov, Andrey
    Lee, Sungyoung
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2007, 2 : 1220 - +
  • [46] The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm
    Yong, Yang
    2012 INTERNATIONAL CONFERENCE ON FUTURE ELECTRICAL POWER AND ENERGY SYSTEM, PT A, 2012, 17 : 164 - 170
  • [47] Particle Swarm Optimization with K-means for Simultaneous Feature Selection and Data Clustering
    Prakash, Jay
    Singh, Pramod Kumar
    2015 SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MACHINE INTELLIGENCE (ISCMI), 2015, : 74 - 78
  • [48] Online Sequential Classification of Imbalanced Data by Combining Extreme Learning Machine and improved SMOTE Algorithm
    Mao, Wentao
    Wang, Jinwan
    Wang, Liyun
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [49] Hybrid K-Means and Improved Group Search Optimization Methods for Data Clustering
    Pacifico, Luciano D. S.
    Ludermir, Teresa B.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [50] Optimization of Density-Based K-means Algorithm in Trajectory Data Clustering
    Hao, Mei-Wei
    Dai, Hua-Lin
    Hao, Kun
    Li, Cheng
    Zhang, Yun-Jie
    Song, Hao-Nan
    WIRELESS INTERNET (WICON 2017), 2018, 230 : 440 - 450