BENN: Balanced Ensemble Neural Network for Handling Class Imbalance in Big Data

被引:0
|
作者
Ramesh, Sneha Halebeedu [1 ,2 ]
Basava, Annappa [1 ]
Perumal, Sankar Pariserum [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, Surathkal, India
[2] Nitte Meenakshi Inst Technol, Dept Informat Sci & Engn, Bengaluru, India
关键词
concept drift; decision tree regression; decision trees; machine learning; national health dataset; random forest;
D O I
10.1111/exsy.13754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is a critical challenge in big data analytics, often leading to biased predictive models. This imbalance can lead to biased models that perform well on the majority class but poorly on the minority class. Many machine learning models tend to be biased towards the majority class because they aim to minimise overall error, often leading to poor performance on the minority class. This paper presents the balanced ensemble neural network, a novel solution to effectively address class imbalance in big data. Balanced ensemble neural network combines the robust capabilities of neural networks with the power of ensemble learning, incorporating class balancing strategies to ensure fair representation of minority classes. The methodology involves integrating multiple neural networks, each trained on balanced subsets of data using techniques like Synthetic Minority Over-sampling Technique and Random Undersampling. This integration aims to leverage the strengths of individual networks while reducing their inherent biases. Our extensive experiments across various datasets reveal that BENN achieves an AUC-ROC score of 0.94, surpassing other models such as random forest (0.88), support vector (0.84) and single neural net (0.80). It was also observed that BENN's performance is better compared to traditional neural network models and standard ensemble methods in key metrics like accuracy, precision, recall, F1-score and AUC-ROC. The results specifically highlight BENN's effectiveness in accurately classifying instances of minority classes, a notable challenge in many existing models. These findings underscore BENN's potential as a substantial advancement in handling class imbalance within big data environments, offering a promising direction for future research and application in machine learning.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] PE_DIM: An Efficient Probabilistic Ensemble Classification Algorithm for Diabets Handling Class Imbalance Missing Values
    Jia, Liyan
    Wang, Zhiping
    Lv, Siqi
    Xu, Zhaohui
    IEEE ACCESS, 2022, 10 : 107459 - 107476
  • [42] Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis
    Bano, Shahzadi
    Zhi, Weimei
    Qiu, Baozhi
    Raza, Muhammad
    Sehito, Nabila
    Kamal, Mian Muhammad
    Aldehim, Ghadah
    Alruwais, Nuha
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (07): : 9848 - 9869
  • [43] Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis
    Shahzadi Bano
    Weimei Zhi
    Baozhi Qiu
    Muhammad Raza
    Nabila Sehito
    Mian Muhammad Kamal
    Ghadah Aldehim
    Nuha Alruwais
    The Journal of Supercomputing, 2024, 80 : 9848 - 9869
  • [44] Handling Overfitting and Imbalance Data in Modelling Convolutional Neural Networks for Astronomical Transient Discovery
    Boongoen, Tossapon
    Iam-On, Natthakan
    CONTRIBUTIONS PRESENTED AT THE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CYBERSECURITY AND AI, C3AI 2024, 2024, 884 : 691 - 698
  • [45] Application of Neural Network in Computer Big Data Mining
    Zhang Guoming
    2019 4TH INTERNATIONAL WORKSHOP ON MATERIALS ENGINEERING AND COMPUTER SCIENCES (IWMECS 2019), 2019, : 385 - 390
  • [46] Gas Emergence Big Data and Neural Network Filter
    Li Kun
    Wang Xiaodong
    Liu HuiJing
    Zhang Yunsheng
    Miao Qi
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 6, 2008, : 691 - +
  • [47] Data Augmentation for Intra-class Imbalance with Generative Adversarial Network
    Hase, Natsuki
    Ito, Seiya
    Kaneko, Naoshi
    Sumi, Kazuhiko
    FOURTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2019, 11172
  • [48] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2019, 10 (04) : 824 - 835
  • [49] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Khyati Ahlawat
    Anuradha Chug
    Amit Prakash Singh
    International Journal of System Assurance Engineering and Management, 2019, 10 : 824 - 835
  • [50] Handling missing data through deep convolutional neural network
    Khan, Hufsa
    Wang, Xizhao
    Liu, Han
    INFORMATION SCIENCES, 2022, 595 : 278 - 293