A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE

被引:2
|
作者
Kun Jiang
Jing Lu
Kuiliang Xia
机构
[1] HeiHe University,
关键词
Imbalanced dataset; Classification; SMOTE; Sampling rate; Genetic algorithm; Rockburst;
D O I
暂无
中图分类号
学科分类号
摘要
The classification of imbalanced data has been recognized as a crucial problem in machine learning and data mining. In an imbalanced dataset, there are significantly fewer training instances of one class compared to another class. Hence, the minority class instances are much more likely to be misclassified. In the literature, the synthetic minority over-sampling technique (SMOTE) has been developed to deal with the classification of imbalanced datasets. It synthesizes new samples of the minority class to balance the dataset, by re-sampling the instances of the minority class. Nevertheless, the existing algorithms-based SMOTE uses the same sampling rate for all instances of the minority class. This results in sub-optimal performance. To address this issue, we propose a novel genetic algorithm-based SMOTE (GASMOTE) algorithm. The GASMOTE algorithm uses different sampling rates for different minority class instances and finds the combination of optimal sampling rates. The experimental results on ten typical imbalance datasets show that, compared with SMOTE algorithm, GASMOTE can increase 5.9% on F-measure value and 1.6% on G-mean value, and compared with Borderline-SMOTE algorithm, GASMOTE can increase 3.7% on F-measure value and 2.3% on G-mean value. GASMOTE can be used as a new over-sampling technique to deal with imbalance dataset classification problem. We have particularly applied the GASMOTE algorithm to a practical engineering application: prediction of rockburst in the VCR rockburst datasets. The experiment results indicate that the GASMOTE algorithm can accurately predict the rockburst occurrence and hence provides guidance to the design and construction of safe deep mining engineering structures.
引用
收藏
页码:3255 / 3266
页数:11
相关论文
共 50 条
  • [1] A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE
    Jiang, Kun
    Lu, Jing
    Xia, Kuiliang
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2016, 41 (08) : 3255 - 3266
  • [2] Imbalance Data Classification Method Based on Improved SMOTE Algorithm and Granular Computing
    Dong, QiLiang
    Lu, Wei
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3196 - 3201
  • [3] Ensemble classification algorithm based improved SMOTE for imbalanced data
    Ning, Liu, 1600, Natsional'nyi Hirnychyi Universytet
  • [4] A Novel Algorithm for Imbalance Data Classification Based on Neighborhood Hypergraph
    Hu, Feng
    Liu, Xiao
    Dai, Jin
    Yu, Hong
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [5] SVM Classification: Optimization with the SMOTE Algorithm for the Class Imbalance Problem
    Demidova, Liliya
    Klyueva, Irina
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 472 - 475
  • [6] Ensemble imbalance classification: Using data preprocessing, clustering algorithm and genetic algorithm
    Abolkarlou, Niloofar Afshari
    Niknafs, Ali Akbar
    Ebrahimpour, Mohammad Kazem
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 171 - 176
  • [7] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Shujuan Wang
    Yuntao Dai
    Jihong Shen
    Jingxue Xuan
    Scientific Reports, 11
  • [8] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Wang, Shujuan
    Dai, Yuntao
    Shen, Jihong
    Xuan, Jingxue
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [9] A Novel SMOTE-Based Classification Approach to Online Data Imbalance Problem
    Gong, Chunlin
    Gu, Liangxian
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [10] A dissimilarity-based imbalance data classification algorithm
    Zhang, Xueying
    Song, Qinbao
    Wang, Guangtao
    Zhang, Kaiyuan
    He, Liang
    Jia, Xiaolin
    APPLIED INTELLIGENCE, 2015, 42 (03) : 544 - 565