Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest

被引:0
|
作者
Bhagat, Reshma C. [1 ]
Patil, Sachin S. [1 ]
机构
[1] Rajarambapu Inst Technol, Dept CSE, Islampur Sangli, MS, India
关键词
Data mining; Multi-class Imbalanced data; Oversampling; MapReduce; Machine Learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of big data, the applications generating tremendous amount of data are becoming the main focus of attention as the wide increment of data generation and storage that has taken place in the last few years. This scenario is challenging for data mining techniques which are not arrogated to the new space and time requirements. In many of the real world applications, classification of imbalanced data-sets is the point of attraction. Most of the classification methods focused on two-class imbalanced problem. So, it is necessary to solve multi-class imbalanced problem, which exist in real-world domains. In the proposed work, we introduced a methodology for classification of multi-class imbalanced data. This methodology consists of two steps: In first step we used Binarization techniques (OVA and OVO) for decomposing original dataset into subsets of binary classes. In second step, the SMOTE algorithm is applied against each subset of imbalanced binary class in order to get balanced data. Finally, to achieve classification goal Random Forest (RF) classifier is used. Specifically, oversampling technique is adapted to big data using MapReduce so that this technique is able to handle as large data-set as needed. An experimental study is carried out to evaluate the performance of proposed method. For experimental analysis, we have used different datasets from UCI repository and the proposed system is implemented on Apache Hadoop and Apache Spark platform. The results obtained shows that proposed method outperforms over other methods.
引用
收藏
页码:403 / 408
页数:6
相关论文
共 50 条
  • [1] Imbalanced Data Classification using Random Subspace Method and SMOTE
    Huang, Hsiao-Yun
    Lin, Yi-Jhen
    Chen, Youg-Siang
    Lu, Hung-Yi
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 817 - 820
  • [2] Imbalanced Big Data Classification: A Distributed Implementation of SMOTE
    Rastogi, Avnish Kumar
    Narang, Nitin
    Siddiqui, Zamir Ahmad
    PROCEEDINGS OF THE WORKSHOP PROGRAM OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN'18), 2018,
  • [3] Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding
    Wang, Juanjuan
    Xu, Mantao
    Wang, Hui
    Zhang, Jiwu
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1815 - +
  • [4] On the use of MapReduce for imbalanced big data using Random Forest
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    INFORMATION SCIENCES, 2014, 285 : 112 - 137
  • [5] Ensemble classification algorithm based improved SMOTE for imbalanced data
    Ning, Liu, 1600, Natsional'nyi Hirnychyi Universytet
  • [6] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Shujuan Wang
    Yuntao Dai
    Jihong Shen
    Jingxue Xuan
    Scientific Reports, 11
  • [7] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Wang, Shujuan
    Dai, Yuntao
    Shen, Jihong
    Xuan, Jingxue
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [8] Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification
    Ksieniewicz, Pawel
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 660 - 673
  • [9] A Classification Method of Imbalanced Big Data Based on Improved SMOTE and Stacked LSTM
    Xu, Wentao
    Journal of Network Intelligence, 2023, 8 (01): : 100 - 112
  • [10] Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm
    Jeatrakul, Piyasak
    Wong, Kok Wai
    Fung, Chun Che
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 152 - 159