Benchmarking framework for class imbalance problem using novel sampling approach for big data

被引:7
|
作者
Ahlawat, Khyati [1 ]
Chug, Anuradha [1 ]
Singh, Amit Prakash [1 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, Sect 16C, Delhi 110078, India
关键词
Class imbalance; SMOTE; Sampling; Big data; Machine learning; MAP REDUCE SOLUTION; CLASSIFICATION; MAPREDUCE; SYSTEMS; HADOOP;
D O I
10.1007/s13198-019-00817-6
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The traditional techniques of machine learning always need to be strengthened for dealing with cosmic nature of big data for systematic and methodical learning. The unbalanced distribution of classes in big data, popularly known as imbalanced big data chases the problem of learning to a much higher level. The conventional methods are being progressively modified to handle and curtail the problem of learning from imbalanced datasets in the context of big data at the data level and algorithmic level. In the current study, a cluster heads based data level sampling solution which inherits edge of K-Means and Fuzzy C-Means clustering approaches is applied. The proposed approach is evaluated with three different classifiers namely Support Vector Machines, Decision Tree and k-Nearest Neighbor and compared with conventional SMOTE algorithm. The experiment has shown promising results with an increment of 8.09% and 35.71% in terms of accuracy and AUC respectively, for all imbalanced datasets. This work imparts a baseline comparison of solutions for imbalanced classification at data level in big data scenario and proposes an efficient clustering-based solution for same.
引用
收藏
页码:824 / 835
页数:12
相关论文
共 50 条
  • [1] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Khyati Ahlawat
    Anuradha Chug
    Amit Prakash Singh
    [J]. International Journal of System Assurance Engineering and Management, 2019, 10 : 824 - 835
  • [2] A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    [J]. ADVANCES IN DATA SCIENCE AND ADAPTIVE ANALYSIS, 2021, 13 (02)
  • [3] Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem
    Rendon, Erendira
    Alejo, Roberto
    Castorena, Carlos
    Isidro-Ortega, Frank J.
    Granda-Gutierrez, Everardo E.
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [4] A novel framework for class imbalance learning using intelligent under-sampling
    Naganjaneyulu S.
    Kuppa M.R.
    [J]. Progress in Artificial Intelligence, 2013, 2 (01) : 73 - 84
  • [5] A novel data augmentation approach to fault diagnosis with class-imbalance problem
    Tian, Jilun
    Jiang, Yuchen
    Zhang, Jiusi
    Luo, Hao
    Yin, Shen
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 243
  • [6] A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling
    Tahir, Muhammad Atif
    Kittler, Josef
    Mikolajczyk, Krystian
    Yan, Fei
    [J]. MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2009, 5519 : 82 - 91
  • [7] Author identification: Using text sampling to handle the class imbalance problem
    Stamatatos, Efstathios
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (02) : 790 - 799
  • [8] Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
    Sikora, Riyaz
    Lee, Yoon Sang
    [J]. INFORMATION SYSTEMS FRONTIERS, 2024,
  • [9] Effective management of class imbalance problem in climate data analysis using a hybrid of deep learning and data level sampling
    Aarthi, R. J.
    Vinayagasundaram, B.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (04) : 4187 - 4199
  • [10] Feature Extraction for Class Imbalance Using a Convolutional Autoencoder and Data Sampling
    Salekshahrezaee, Zahra
    Leevy, Joffrey L.
    Khoshgoftaar, Taghi M.
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 217 - 223