Ensemble learning based predictive modelling on a highly imbalanced multiclass data

被引:0
|
作者
Vasti, Manka [1 ,2 ]
Dev, Amita [3 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, New Delhi 110078, India
[2] GD Goenka Univ, Sch Engn & Sci, Dept Comp Sci & Engn, Gurugram 122103, Haryana, India
[3] Directorate Training & Tech Educ, Delhi 110034, India
来源
关键词
Ensemble learning; Data augmentation; Earthquake prediction; Cluster based undersam-; CLASSIFICATION; PERFORMANCE;
D O I
10.47974/JIOS-1778
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Class imbalance in the real-world datasets is a big challenge and the domains such as fraud detection, calamity occurrences, bankruptcy prediction etc. are prone to class imbalance due to the nature of occurrences of the events. In this paper, the detailed research using six ensemble machine learning techniques is applied to the undersampled, oversampled and the original dataset and the results are compared. The results of the research study indicates that amongst the applied six ensemble learners, the best learner is Random Forest algorithm (with entropy gain) implemented using ten-fold cross validation on the SMOTE oversampled dataset. 0.95 AUC and 0.8689 accuracy i.e. an increase of 4% in accuracy and substantial
引用
收藏
页码:2141 / 2164
页数:24
相关论文
共 50 条
  • [41] Multi-window based ensemble learning for classification of imbalanced streaming data
    Li, Hu
    Wang, Ye
    Wang, Hua
    Zhou, Bin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (06): : 1507 - 1525
  • [42] An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data
    Zhang, Chen
    Zhang, Xiaolong
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 377 - 388
  • [43] Ensemble Learning on Large Scale Financial Imbalanced Data
    Sanabila, H. R.
    Jatmiko, Wisnu
    2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 93 - 98
  • [44] Multi-window based ensemble learning for classification of imbalanced streaming data
    Hu Li
    Ye Wang
    Hua Wang
    Bin Zhou
    World Wide Web, 2017, 20 : 1507 - 1525
  • [45] A Voronoi Diagram Based Classifier for Multiclass Imbalanced Data Sets
    Silva, Evandro J. R.
    Zanchettin, Cleber
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 109 - 114
  • [46] Equalization ensemble for large scale highly imbalanced data classification
    Ren, Jinjun
    Wang, Yuping
    Mao, Mingqian
    Cheung, Yiu-ming
    KNOWLEDGE-BASED SYSTEMS, 2022, 242
  • [47] Ensemble Learning Based on Active Example Selection for Solving Imbalanced Data Problem in Biomedical Data
    Lee, Min Su
    Oh, Sangyoon
    Zhang, Byoung-Tak
    2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 350 - +
  • [48] Combining Sampling and Ensemble Classifier for Multiclass Imbalance Data Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Adnan, Fairuz
    Ahmad, Faudziah
    COMPUTATIONAL SCIENCE AND TECHNOLOGY, ICCST 2017, 2018, 488 : 262 - 272
  • [49] Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text
    Al-Azani, Sadam
    El-Alfy, El-Sayed M.
    8TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2017) AND THE 7TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT 2017), 2017, 109 : 359 - 366
  • [50] DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning
    Yijing Chen
    Bo Pang
    Guolin Shao
    Guozhu Wen
    Xingshu Chen
    TsinghuaScienceandTechnology, 2021, 26 (04) : 387 - 402