Imbalanced data optimization combining K-means and SMOTE

被引:1
|
作者
Li W. [1 ]
机构
[1] Hebei Vocational and Technical College of Building Materials, Qinhuangdao
关键词
Classification; Imbalanced data; K-Means; Random forest; SMOTE;
D O I
10.23940/ijpe.19.08.p17.21732181
中图分类号
学科分类号
摘要
With the wide application of imbalanced data processing in various fields, such as credit card fraud identification, network intrusion detection, cancer detection, commodity recommendation, software defect prediction, and customer churn prediction, imbalanced data has become one of the current research hotspots. When classifying imbalanced data sets, aiming at the problems of low classification accuracy of negative class samples in the random forest algorithm and marginalization for selecting new samples in the SMOTE algorithm, a new algorithm, KMS_SMOTE, is proposed to deal with imbalanced data sets. In order to avoid the problem of marginalization of new samples, the K-Means algorithm is used to classify the negative class samples to obtain the centroid of the negative class samples, and then the new data set is obtained by selecting the samples near the centroid. Finally, in order to verify the effect of the KMS_SMOTE algorithm, it is compared with the SMOTE algorithm on the data sets of UCI machine learning. The experimental results show that the KMS_SMOTE algorithm effectively improves the classification performance of the random forest algorithm on the imbalanced data set. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:2173 / 2181
页数:8
相关论文
共 50 条
  • [1] A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data
    Xu, Zhaozhao
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    Yin, Nan
    Han, Xi
    INFORMATION SCIENCES, 2021, 572 : 574 - 589
  • [2] An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means
    Guo, Chaoyou
    Ma, Yankun
    Xu, Zhe
    Cao, Mengmeng
    Yao, Qian
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 1467 - 1469
  • [3] The incremental SMOTE: A new approach based on the incremental k-means algorithm for solving imbalanced data set problem
    Turan, Duygu Selin
    Ordin, Burak
    INFORMATION SCIENCES, 2025, 711
  • [4] An automatic identification method of imbalanced lithology based on Deep Forest and K-means SMOTE
    Zhu, Xinyi
    Zhang, Hongbing
    Ren, Quan
    Zhang, Dailu
    Zeng, Fanxing
    Zhu, Xinjie
    Zhang, Lingyuan
    GEOENERGY SCIENCE AND ENGINEERING, 2023, 224
  • [5] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
    Douzas, Georgios
    Bacao, Fernando
    Last, Felix
    INFORMATION SCIENCES, 2018, 465 : 1 - 20
  • [6] An AdaBoost Method with K'K-Means Bayes Classifier for Imbalanced Data
    Zhang, Yanfeng
    Wang, Lichun
    MATHEMATICS, 2023, 11 (08)
  • [7] Combining K-means and Particle Swarm Optimization for Dynamic Data Clustering Problems
    Kao, Yucheng
    Lee, Szu-Yuan
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 757 - 761
  • [8] Undersampled K-means approach for handling imbalanced distributed data
    Kumar, N. Santhosh
    Rao, K. Nageswara
    Govardhan, A.
    Reddy, K. Sudheer
    Mahmood, Ali Mirza
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2014, 3 (01) : 29 - 38
  • [9] Combining PSO and k-means to Enhance Data Clustering
    Ahmadyfard, Alireza
    Modares, Hamidreza
    2008 INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 688 - 691
  • [10] Malicious Domain Detection Based on K-means and SMOTE
    Wang, Qing
    Li, Linyu
    Jiang, Bo
    Lu, Zhigang
    Liu, Junrong
    Jian, Shijie
    COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 468 - 481