A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets

被引:0
|
作者
Wong, Ginny Y. [1 ]
Leung, Frank H. F. [1 ]
Ling, Sai-Ho [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Univ Technol Sydney, Fac Engn & Informat Technol, Sydney, NSW, Australia
关键词
CLASSIFICATION; ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced datasets are commonly encountered in real-world classification problems. However, many machine learning algorithms are originally designed for well-balanced datasets. Re-sampling has become an important step to preprocess imbalanced dataset. It aims at balancing the datasets by increasing the sample size of the smaller class or decreasing the sample size of the larger class, which are known as over-sampling and under-sampling respectively. In this paper, a novel sampling strategy based on both over-sampling and under-sampling is proposed, in which the new samples of the smaller class are created by the Synthetic Minority Over-sampling Technique (SMOTE). The improvement of the datasets is done by the evolutionary computational method of CHC that works on both the minority class and majority class samples. The result is a hybrid data preprocessing method that combines both over-sampling and under-sampling techniques to re-sample datasets. The evaluation is done by applying the learning algorithm C4.5 to obtain a classification model from the re-sampled datasets. Experimental results reported that the proposed approach can decrease the over-sampling rate about 50% with only around 3% discrepancy on the accuracy.
引用
收藏
页码:2354 / 2359
页数:6
相关论文
共 50 条
  • [1] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    [J]. JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [2] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Hao Ding
    Bin Wei
    Zhaorui Gu
    Zhibin Yu
    Haiyong Zheng
    Bing Zheng
    Juan Li
    [J]. Multimedia Tools and Applications, 2020, 79 : 14871 - 14888
  • [3] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Ding, Hao
    Wei, Bin
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Haiyong
    Zheng, Bing
    Li, Juan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14871 - 14888
  • [4] An Over-sampling Method Based on Probability Density Estimation for Imbalanced Datasets Classification
    Cao, Lu
    Zhai, Yi-Kui
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [5] Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets
    Fan, Xiannian
    Tang, Ke
    Weise, Thomas
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 309 - 320
  • [6] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [7] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    [J]. PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [8] Over-sampling imbalanced datasets using the covariance matrix
    Leguen-de Varona, Ireimis
    Madera, Julio
    Martínez-López, Yoan
    Hernández-Nieto, José Carlos
    [J]. EAI Endorsed Transactions on Energy Web, 2020, 7 (27) : 1 - 6
  • [9] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [10] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    [J]. Frontiers of Computer Science, 2018, 12 : 331 - 350