Ensemble learning based predictive modelling on a highly imbalanced multiclass data

被引:0
|
作者
Vasti, Manka [1 ,2 ]
Dev, Amita [3 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, New Delhi 110078, India
[2] GD Goenka Univ, Sch Engn & Sci, Dept Comp Sci & Engn, Gurugram 122103, Haryana, India
[3] Directorate Training & Tech Educ, Delhi 110034, India
来源
关键词
Ensemble learning; Data augmentation; Earthquake prediction; Cluster based undersam-; CLASSIFICATION; PERFORMANCE;
D O I
10.47974/JIOS-1778
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Class imbalance in the real-world datasets is a big challenge and the domains such as fraud detection, calamity occurrences, bankruptcy prediction etc. are prone to class imbalance due to the nature of occurrences of the events. In this paper, the detailed research using six ensemble machine learning techniques is applied to the undersampled, oversampled and the original dataset and the results are compared. The results of the research study indicates that amongst the applied six ensemble learners, the best learner is Random Forest algorithm (with entropy gain) implemented using ten-fold cross validation on the SMOTE oversampled dataset. 0.95 AUC and 0.8689 accuracy i.e. an increase of 4% in accuracy and substantial
引用
收藏
页码:2141 / 2164
页数:24
相关论文
共 50 条
  • [21] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [22] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Chen, Zhi
    Lin, Tao
    Xia, Xin
    Xu, Hongyan
    Ding, Sha
    APPLIED INTELLIGENCE, 2018, 48 (08) : 2441 - 2457
  • [23] A hybrid data-level ensemble to enable learning from highly imbalanced dataset
    Chen, Zhi
    Duan, Jiang
    Kang, Li
    Qiu, Guoping
    INFORMATION SCIENCES, 2021, 554 : 157 - 176
  • [24] Using Graph-Based Ensemble Learning to Classify Imbalanced Data
    Qin, Anyong
    Shang, Zhaowei
    Tian, Jinyu
    Zhang, Taiping
    Wang, Yulong
    Tang, Yuan Yan
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2017, : 265 - 270
  • [25] An Ensemble Tree Classifier for Highly Imbalanced Data Classification
    SHI Peibei
    WANG Zhong
    Journal of Systems Science & Complexity, 2021, 34 (06) : 2250 - 2266
  • [26] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Zhi Chen
    Tao Lin
    Xin Xia
    Hongyan Xu
    Sha Ding
    Applied Intelligence, 2018, 48 : 2441 - 2457
  • [27] A Heterogeneous AdaBoost Ensemble Based Extreme Learning Machines for Imbalanced Data
    Abuassba, Adnan Omer
    Zhang, Dezheng
    Luo, Xiong
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2019, 13 (03) : 19 - 35
  • [28] Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data
    Vong, Chi-Man
    Du, Jie
    NEURAL NETWORKS, 2020, 128 : 268 - 278
  • [29] Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data
    Jeong, Bomi
    Cho, Hyunjeong
    Kim, Jieun
    Kwon, Soon Kil
    Hong, SeungWoo
    Lee, ChangSik
    Kim, TaeYeon
    Park, Man Sik
    Hong, Seoksu
    Heo, Tae-Young
    DIAGNOSTICS, 2020, 10 (06)
  • [30] An Ensemble Tree Classifier for Highly Imbalanced Data Classification
    Peibei Shi
    Zhong Wang
    Journal of Systems Science and Complexity, 2021, 34 : 2250 - 2266