Ensemble learning based predictive modelling on a highly imbalanced multiclass data

被引:0
|
作者
Vasti, Manka [1 ,2 ]
Dev, Amita [3 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, Univ Sch Informat Commun & Technol, New Delhi 110078, India
[2] GD Goenka Univ, Sch Engn & Sci, Dept Comp Sci & Engn, Gurugram 122103, Haryana, India
[3] Directorate Training & Tech Educ, Delhi 110034, India
来源
关键词
Ensemble learning; Data augmentation; Earthquake prediction; Cluster based undersam-; CLASSIFICATION; PERFORMANCE;
D O I
10.47974/JIOS-1778
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Class imbalance in the real-world datasets is a big challenge and the domains such as fraud detection, calamity occurrences, bankruptcy prediction etc. are prone to class imbalance due to the nature of occurrences of the events. In this paper, the detailed research using six ensemble machine learning techniques is applied to the undersampled, oversampled and the original dataset and the results are compared. The results of the research study indicates that amongst the applied six ensemble learners, the best learner is Random Forest algorithm (with entropy gain) implemented using ten-fold cross validation on the SMOTE oversampled dataset. 0.95 AUC and 0.8689 accuracy i.e. an increase of 4% in accuracy and substantial
引用
收藏
页码:2141 / 2164
页数:24
相关论文
共 50 条
  • [1] ENSEMBLE CLASSIFIER AND RESAMPLING FOR IMBALANCED MULTICLASS LEARNING
    Sainin, Mohd Shamrie
    Ahmad, Faudziah
    Alfred, Rayner
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS, 2015, : 751 - 756
  • [2] A Direct Ensemble Classifier for Imbalanced Multiclass Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    2012 4TH CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2012, : 59 - 66
  • [3] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [4] KDE-Based Ensemble Learning for Imbalanced Data
    Kamalov, Firuz
    Moussa, Sherif
    Reyes, Jorge Avante
    ELECTRONICS, 2022, 11 (17)
  • [5] A preprocessing method combined with an ensemble framework for the multiclass imbalanced data classification
    Pavan Kumar M.R.
    Jayagopal P.
    International Journal of Computers and Applications, 2022, 44 (12) : 1178 - 1185
  • [6] Iterative ensemble feature selection for multiclass classification of imbalanced microarray data
    Yang, Junshan
    Zhou, Jiarui
    Zhu, Zexuan
    Ma, Xiaoliang
    Ji, Zhen
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
  • [7] Ensemble learning method based on CNN for class imbalanced data
    Xin Zhong
    Nan Wang
    The Journal of Supercomputing, 2024, 80 : 10090 - 10121
  • [8] ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets
    Liang, Xiayu
    Gao, Ying
    Xu, Shanrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [9] Ensemble learning method based on CNN for class imbalanced data
    Zhong, Xin
    Wang, Nan
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (07): : 10090 - 10121
  • [10] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964