Ensemble learning based predictive modelling on a highly imbalanced multiclass data

被引:0
|
作者
Vasti, Manka [1 ,2 ]
Dev, Amita [3 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, New Delhi 110078, India
[2] GD Goenka Univ, Sch Engn & Sci, Dept Comp Sci & Engn, Gurugram 122103, Haryana, India
[3] Directorate Training & Tech Educ, Delhi 110034, India
来源
关键词
Ensemble learning; Data augmentation; Earthquake prediction; Cluster based undersam-; CLASSIFICATION; PERFORMANCE;
D O I
10.47974/JIOS-1778
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Class imbalance in the real-world datasets is a big challenge and the domains such as fraud detection, calamity occurrences, bankruptcy prediction etc. are prone to class imbalance due to the nature of occurrences of the events. In this paper, the detailed research using six ensemble machine learning techniques is applied to the undersampled, oversampled and the original dataset and the results are compared. The results of the research study indicates that amongst the applied six ensemble learners, the best learner is Random Forest algorithm (with entropy gain) implemented using ten-fold cross validation on the SMOTE oversampled dataset. 0.95 AUC and 0.8689 accuracy i.e. an increase of 4% in accuracy and substantial
引用
收藏
页码:2141 / 2164
页数:24
相关论文
共 50 条
  • [31] Merge Loss Calculation Method for Highly Imbalanced Data Multiclass Classification
    Du, Zehua
    Zhang, Hao
    Wei, Zhiqiang
    Zhu, Yuanyuan
    Xu, Jiali
    Huang, Xianqing
    Yin, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 14
  • [32] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [33] A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification
    Klikowski, Jakub
    Ksieniewicz, Pawel
    Wozniak, Michal
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2019), PT II, 2019, 11872 : 340 - 352
  • [34] A dynamic ensemble learning based data mining framework for medical imbalanced big data
    Rithani, M.
    Kumar, R. Prasanna
    Ali, Altalbe
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [35] A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
    Collell, Guillem
    Prelec, Drazen
    Patil, Kaustubh R.
    NEUROCOMPUTING, 2018, 275 : 330 - 340
  • [36] Cost-Sensitive Ensemble Learning for Highly Imbalanced Classification
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1427 - 1434
  • [37] Noise Avoidance SMOTE in Ensemble Learning for Imbalanced Data
    Kim, Kyoungok
    IEEE ACCESS, 2021, 9 : 143250 - 143265
  • [38] An Ensemble Learning Algorithm Based on Density Peaks Clustering and Fitness for Imbalanced Data
    Xu, Hui
    Liu, Qicheng
    IEEE ACCESS, 2022, 10 : 116120 - 116128
  • [39] Online ensemble learning algorithm for imbalanced data stream
    Hongle, Du
    Yan, Zhang
    Gang, Ke
    Lin, Zhang
    Chen, Yeh-Cheng
    APPLIED SOFT COMPUTING, 2021, 107
  • [40] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831