A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data

被引:1
|
作者
Ahlawat, Khyati [1 ]
Chug, Anuradha [2 ]
Singh, Amit Prakash [2 ]
机构
[1] Indira Gandhi Delhi Tech Univ Women Kashmere Gate, Delhi 110006, India
[2] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Sector 16C, Delhi 110078, India
关键词
Imbalance data; clustering; big data processing; biasness; sampling; CLASSIFICATION; MAPREDUCE; PREDICTION; SYSTEMS; FRAMEWORK; INSIGHT; HADOOP;
D O I
10.1142/S2424922X21500054
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The uneven distribution of classes in any dataset poses a tendency of biasness toward the majority class when analyzed using any standard classifier. The instances of the significant class being deficient in numbers are generally ignored and their correct classification which is of paramount interest is often overlooked in calculating overall accuracy. Therefore, the conventional machine learning approaches are rigorously refined to address this class imbalance problem. This challenge of imbalanced classes is more prevalent in big data scenario due to its high volume. This study deals with acknowledging a sampling solution based on cluster computing in handling class imbalance problems in the case of big data. The newly proposed approach hybrid sampling algorithm (HSA) is assessed using three popular classification algorithms namely, support vector machine, decision tree and k-nearest neighbor based on balanced accuracy and elapsed time. The results obtained from the experiment are considered promising with an efficiency gain of 42% in comparison to the traditional sampling solution synthetic minority oversampling technique (SMOTE). This work proves the effectiveness of the distribution and clustering principle in imbalanced big data scenarios.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2019, 10 (04) : 824 - 835
  • [2] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Khyati Ahlawat
    Anuradha Chug
    Amit Prakash Singh
    [J]. International Journal of System Assurance Engineering and Management, 2019, 10 : 824 - 835
  • [3] Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem
    Rendon, Erendira
    Alejo, Roberto
    Castorena, Carlos
    Isidro-Ortega, Frank J.
    Granda-Gutierrez, Everardo E.
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [4] A hybrid algorithm for solving generalized class cover problem
    Huang, Yanxin
    Zhou, Chunguang
    Wang, Yan
    Bao, Yongli
    Wu, Yin
    Li, Yuxin
    [J]. ADVANCES IN NATURAL COMPUTATION, PT 1, 2006, 4221 : 610 - 619
  • [5] A Novel Clustering-Based Three Level Under-Sampling Algorithm For Class Imbalance Problem
    Pratap, Vibha
    Singh, Amit Prakash
    [J]. JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2023, 27 (04): : 2319 - 2329
  • [6] Solving the class imbalance problem using a counterfactual method for data augmentation
    Temraz, Mohammed
    Keane, Mark T.
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [7] A Novel Hybrid-Based Ensemble for Class Imbalance Problem
    Guo, Huaping
    Zhou, Jun
    Wu, Chang-an
    She, Wei
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (06)
  • [8] Effective management of class imbalance problem in climate data analysis using a hybrid of deep learning and data level sampling
    Aarthi, R. J.
    Vinayagasundaram, B.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (04) : 4187 - 4199
  • [9] Auto-Tuning of Parameters in Hybrid Sampling Method for Class Imbalance Problem
    Sanguanmak, Yotsathon
    Hanskunatai, Anantaporn
    [J]. 2016 20TH INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2016,
  • [10] THE METHODS FOR QUANTITATIVE SOLVING THE CLASS IMBALANCE PROBLEM
    Kavrin, D. A.
    Subbotin, S. A.
    [J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2018, (01) : 83 - 90